Monday, June 29, 2015

Machine learning with malware part 2: Model selection

Unrelated background information:
I wished I'd gotten a lot further a lot sooner with this. Also this blog post is only barely up to the standard I set for myself. Compressing so much statistics and machine learning into a blog post of reasonable length while avoiding excessive math is a terribly difficult thing to do. I hope that it lifts the vail a bit around what machine learning is and how it can be leveraged in a malware setting, but if you really wish to get down with it there can be no two opinions about you need a real text book with all the rigor that it requires.

 Finally it took a long time because most of my family conspired to celebrate round birthdays. Also climbing season started which keeps me away from the computer. Finally I have this feeling that the "Predator" from Machine Learning Part 1. blog produced insufficient amount of features. So spend a significant amount of time embedding my code emulator (from unpacking with primitive emulation) into Predator at first thinking it would be easy. Emulating on 80000 files malformed files, where a significant amount is missing DLL's etc. turned out to be way more time consuming than I imagined.

The blog post is here:
Machine learning with malware part 2: Model Selection

Thursday, June 18, 2015

LangSec opinions and a case INTO overflows

On a entirely different note: The next machine learning and malware post is almost finished...

A very short introduction to LangSec

Sassaman, Patterson & Bratus gives a better and more formal introduction and they are probably less boring that I am and definately more accurate.
When I first read about LangSec (language security) I did as I always do reading about a new topic. Boil it down as much as I can and my distillate was “If you don’t make errors, your code will be secure”. It is a truism and an annoying one at that, because it either asks the impossible or points fingers at you for being what you are - a human. It is in many way still how I think of LangSec but digging below the surface of this new computer science discipline is never the less very much worth while . In fact I think it poses one of the most promising ways of approaching one of the roots of the insecurity problem. Which is of cause why I bother to write about it. Though it is mostly defensive, anybody playing offense might benefit from the way of thinking. 

I think the two most import insights I’ve found in LangSec is:
1)      User input should not be treated as data. Once user input wasn’t expected it is no longer data, it’s a program running on computer made up of your software, hardware and the state of it. A so called “weird machine”. The program is written in a language that can run on the weird machine does weird things and weird is bad.
2)      Postel’s law that states that you should be “liberal in what data you accept, and strict in what you send”. Kill this law and replace it with “Be strict with user data”.
The reason why 1) is really important is because it translates a security problem which are often diffuse into a very well defined language theoretic problem. Writing secure code becomes a question of developing a language that is well defined for any user data. What I mean by "well defined" is backed by standard theory. Awesome!  
If 1) describes the way to think about information security, 2) is a large part of how you do it in practice. Being strict with input and output radically reduces the chance that we as coders will retreat to assumptions.  Assumptions are the mother of all f-ups.

Despite not working in classical infosec I’ve spend a significant part of my career exploiting that people made their protocols more complex than they had to or that those who implemented the protocol wasn’t particularly strict in interpreting it. As an example I once developed a DVD video copy protection though it isn’t infosec it’s very much an exercise in utilizing that programmers had not taken 1 and 2 to heart. Part of that copy protection is just a Denial of Service (DoS) attack on ripping software. Three components made copy protection for video DVD possible in the first place. The first is that the DVD Video specification has inconsistencies, undefined behavior, unnecessary flexibility, it is huge and confusing. The complete documentation is about  two feet of shelves space.  This almost surely make programmers of rippers and players offer me a weird machine to program in the first place. Secondly neither DVD rippers nor players are strict with the input data. The third element is that rippers and players react differently to fringe input. The challenge is then boiled down to writing a program for the weird machine in rippers that'll cause denial of service of some kind, while making sure that particular program does not disrupt players.

A lot of LangSec has focused on parsers (and the "reverse parser" that is building the data to be parsed) and this seems reasonable. With the two shelve-feet of documentation most of it written only in the notoriously difficult to precisely parse language of human-readable english, errors are bound to be made when implementing it.  LangSec has recommendations how you can improve the processes of writing the documentation in the first place. For instance replace .h files with something that also describes relations of the fields. LangSec also has recommendation on how you should deal with implementing a parser and this is something most coders should read up on and take to heart. It will significantly improve security of software. It's a different angle of attack than the classic approach of leaving security to compilers, operating systems and hardware. Now I'm a great fan of strcpy_s type functions, ASLR, DEP, CFG, sandboxes etc. and all the approaches made in this spirit, but they obviously aren't sufficient for security. 

A real life integer overflow programming error

Below I have listed the sanity checking of the Import Directory in Sebastian Porst's PeLib (ImportDirectory.h). I've chosen this an example of a classic int overflow problem. I had a couple of reasons why I chose this. First reason was that I'd stumbled upon it resonantly and thus was readily available to me. The second reason is that it's a pretty classic example of not taking 1) and 2) above to heart. The third is that it's written by somebody who was sensitive to security issues. Mr. Porst has made himself an impressive career in infosec. Yet he made a mistake. I'm not arguing that Mr. Porst is a bad coder (quite the contrary). I'm arguing if he made such a mistake most programmers not only can but are likely to make this kind of mistake at one point or another. Before this leads to misunderstandings: I consider Mr. Porst's code generally of good quality and Mr. Porst did indeed have good reasons to ignore 1) and 2) above - he wanted his lib to be able to repair broken PE files which means he cannot be strict when parsing PE's.
So let's turn or attention to the code:

       * Read an import directory from a file.
       * \todo Check if streams failed.
       * @param strFilename Name of the file which will be read.
       * @param uiOffset Offset of the import directory (see #PeLib::PeHeader::getIDImportRVA).
       * @param uiSize Size of the import directory (see #PeLib::PeHeader::getIDImportSize).
       * @param pehHeader A valid PE header.
1: int ImportDirectory<bits>::read(const std::string& strFilename, unsigned int 2:uiOffset, unsigned int uiSize, const PeHeaderT<bits>& pehHeader)
4:     std::ifstream ifFile(strFilename.c_str(), std::ios_base::binary);
5:     if (!ifFile)
6:     {
7:            return ERROR_OPENING_FILE;
8:     }
10:    unsigned int uiFileSize = fileSize(ifFile);
12:    if (uiFileSize < uiOffset + uiSize)
13:    {
14:           return ERROR_INVALID_FILE;
15:    }
16: ifFile.seekg(uiOffset, std::ios_base::beg);
17: std::vector<unsigned char> vImportdirectory(uiSize);
18:<char*>(&vImportdirectory[0]), uiSize);

20: unsigned int uiDesccounter = 0;

21: InputBuffer inpBuffer(vImportdirectory);

22: std::vector<PELIB_IMAGE_IMPORT_DIRECTORY<bits> > vOldIidCurr;

23: do // Read and store all descriptors
24: {
25:    inpBuffer >> iidCurr.impdesc.OriginalFirstThunk;
26:    inpBuffer >> iidCurr.impdesc.TimeDateStamp;
27:    inpBuffer >> iidCurr.impdesc.ForwarderChain;
28:    inpBuffer >> iidCurr.impdesc.Name;
29:    inpBuffer >> iidCurr.impdesc.FirstThunk;
31:    if (iidCurr.impdesc.OriginalFirstThunk != 0 || iidCurr.impdesc.TimeDateStamp != 32:           0 || iidCurr.impdesc.ForwarderChain != 0 ||
33:          iidCurr.impdesc.Name != 0 || iidCurr.impdesc.FirstThunk != 0)
34:    {
35:           vOldIidCurr.push_back(iidCurr);
36:    }
38:    uiDesccounter++;
40:    if (uiSize < (uiDesccounter + 1) * PELIB_IMAGE_IMPORT_DESCRIPTOR::size()) break;
41: } while (iidCurr.impdesc.OriginalFirstThunk != 0 ||
42:    iidCurr.impdesc.TimeDateStamp != 0      ||
43:    iidCurr.impdesc.ForwarderChain != 0 ||
44:    iidCurr.impdesc.Name != 0 || iidCurr.impdesc.FirstThunk != 0);


Though there are a few layers of code above this, essentially "uiSize" and "uiOffset" parameters is unverified user data (uiOffset is checked against 0, but no checks otherwise). We have the verification of the parameters in line 12.What Mr. Porst  must have thought is pretty clear if the sum of these two is bigger than the filesize it's wrong. What he forgot was that uiFileSize > uiOffset + uiSize if uiOffset = 0xFFFFFFFF and uiSize = 2 because of an unsigned integer overflow in the calculation[1]. In fact we can craft PE files with arbitrary values of uiOffset and that is not expected. We are now programming a wierd machine. In Mr. Porst's code what we can do with our overflow error is fairly limited. We can cause a regular crash but beyond that the code looks solid - see what happens if we use uiSize=0xFFFFFFFF and uiOffset=2. What had happed if we changed lines 16,17,18 and 21 a wee bit so that we read the entire file and just parse the right portion of the file instead of reading only the import section:
16: ifFile.seekg(0, std::ios_base::beg);
17: std::vector<unsigned char> vImportdirectory(uiFileSize /*uiSize*/);
18:<char*>(&vImportdirectory[0]), uiFileSize /*uiSize*/);
21: InputBuffer inpBuffer(vImportdirectory); inpBuffer.set(uiOffset);

In the well behaved case everything remains functional. But we now have the potential to leak information. The point being the sanity checking doesn't work. With uiSize > uiFileSize and uiOffset making sure that the check in line 12 works we'd be able to read beyond the buffer allocated with a vector as much as we want. If some webservice dumps to the user the imports using this function we'd be able to dump heap content of the webservice following vector from line 17 and that might contain information not meant for anybodies eyes - and that can be quite valuable to attackers go google Heart Bleed. If we had a write operation instead we'd be writing arbitrary memory and with a memory full of user data and lots of vtables lying around we'd have code execution in no time! It's pretty much standard exploit stuff - except well call it by a different name: Programming the wierd machine.

The LangSec perspective on this error
There is any number of ways to fix the problem above. For example checking that the file read in line 18 succeeds would in the unmodified case stop the DoS from happing. You could easily do fixes of this type. And a great many developers do.
What LangSec suggest is instead that you honor what I've write as point 2). We should be strict. A check for uiSize < uiFileSize should be added. A check for the overflow itself too.  Both should abort parsing if they fail. It would solve the problem. Also being strict the check in line 40 should return an error too instead of proceeding. Even though you could probably find dllnames etc. it's still a breach of protocol and aborting processing will minimize the risk of relying on assumptions that'll lead to another instance of a wierd machine. Idealy you'd even go so far that you'd do that as part of sanity checking before you start pushing values into other variables say line 35. Be sure the data is correct and you know exactly what to do with it, before you use it.
If we step back into 1) what we need to notice is that with the malformed case what happens becomes dependent on what's on the heap after the vector - that is our code isn't well defined. We could use a theorem solver to check for this. At least in this case. I found the bug by running some 80000 malware files through it which I suppose would count as a kind of fuzzing. The key point is, if we first make sure any data gives a predictable outcome, even if that outcome means turning down the request for processing we have written safe code.

The old school compiler solution for this bug

The 0x86 platform always had a number of exception interrupts. I sometimes think of them in terms of old school CPU developers making deep thoughts about what could go wrong in a computer. Probably because the first exception I always think of happens to be division by 0 and that happens to be the first in the x86 list - the first I'd think about. On 5th place on the founding fathers of the x86 CPU list comes the "overflow" interupt. It's trigged by the INTO instruction which essentially checks if the overflow flag is set and if then causes an interrupt 4. In short the CPU has since the dawn of time held the solution to the integer overflow problem. add eax, ebx; into. Done. Overflows no longer serves as "mov" type instructions in a wierd machine, but are always reduced to DoS - in fact a kind of DoS most developers know very well how to deal with using structured exception handling. Unfortunately the INTO instruction is almost never generated by compilers. Even the jo/jno instructions wired to the overflow flag is  hardly ever generated by modern compilers. All three are listed in Z0mbie's opcode frequency list with 0% and that they are in there in the first place is more likely to be errors in Z0mbie's disassembler than because they actually sees use. So this remains an illusion. To make it worse integers cannot be overridden in C++ so I can't even just make up an overflow checking + operator. I have no clue how many security breaches are the result of over/underflows in addition and subtraction but it's probably not that uncommon. As we seen above it's an easy mistake to do because the mathematical + which we all know and love turns out to behave differently than the x86 "add". And while I'd love to have a "overflow free" integer available in C++, the langsec solution of doing things right seems like where we'd want to go.(Well if I had a choice I'd do both).

No bugs, no insecurity. Even if it's a truism.


Sassaman, Patterson & Bratus:'s opcode statistics:

[1]              There is another error in the code. uiSize < sizeof(PELIB_IMAGE_IMPORT_DIRECTORY<bits>) will lead to leaking information too. I'll not discuss that error here.

Speculation on rowhammer


27.07.2015: Daniel Gruss, Clementine Maurice and Stefan Mangard released a paper today detailing how row hammer can be done in java script on Sandy Bridge, Ivy Bridge and Haswell. The method is indeed a pattern read in memory to evict the aggressor address from the cache. They utilize that a cache set only has 12 or 16  entries(ways) and thus using (repeatedly) 13 or 17 addresses that map to a single cache set will cause L3 cache misses on the 13th/17th access, instead of my experiments of keeping the entire cache full (which works like a charm, but it seem too slow for rowhammer). What they ended up with is certainly more advanced that what I've played with, but it's the same ball game. Thus the speculation in the old post below post unfortunately holds true.

It's time to test your ram! Disabling java script will go a long way towards protecting you. White list sites that you consider safe and HTTPS is important to avoid MiM injecting of java scripts. However these measures are insufficient on their own - other advanced scripting languages could probably be abused too (flash, silverlight,....).

A bit more speculation: Apple made an update to their EFI bios to mitigate it. I speculate that they increased the refresh rate, so that there is only 32ms between refreshes instead of the usual 64ms - the reason for this is that a 32ms refresh rate is recommmend for running at high-temperature and thus likely to be readily available. This is insufficient to entirely rule out row hammer as you'd need to drop interval as low as 8ms to be safe. It's important to realize that less time between refresh means speed penalties because you cannot read during the refresh. For 8ms intervals pentalty will be pretty steep, though probably acceptable for 32ms. Also more refresh causes more power consumption. I have seen wildly differing estimate of this, but I speculate that on a running laptop it's not a real issue and if implemented right (say 64ms interval refresh while computer is sleeping) no issue at all .

The paper :

For the technically inclined Mark Seaborn's blog is an awesome resource:

Original Post
It's cool to be a malware hobbyist. I can write purely speculative blogs. It's like being a comedian doing news - it's just comedy... And this blog is pretty speculative - sorry.

The speculation

In my last post on the subject "Row hammer fix POC and generic root kit detection using performance counters" I wrote that I doubt that we'd ever see a script version of this bug. My reasons for this was that scripts are unlikely to use clflush or any non-temporal (MOVNT)instructions in any predictable manner and would probably be on the slow end to flush the cache through accessing the it in a pattern that the CPU wouldn't predict while hammering quickly enough. The first thing that made me change my mind was when I stumbled upon this article: which manipulates the cache and then uses it as a side channel to obtain information from outside a sandbox through java script. This chipped away on my confidence. Then came this article which concludes that JIT compilers would not generate the instructions we needed for row hammering. This scared me because I'd forgotten all about JIT compiling. Because while we might be fast enough without JIT, having JIT would definitely make it fast enough. And finally came a tweet from Lava (@lavados)

Yesterday @BloodyTangerine and I flipped bits on Ivy Bridge without clflush... #rowhammer

So the plot thickens. I don't know how BloodyTangerine and Lavados are flipping the bits but if I were a betting man I'd place my money on the an approach like Spy In the Sandbox. There is a bit of evidence in that direction. The cache is build differently on different chipsets and this could be a reason to mention it. Conclusion: I was wrong. My new opinion is that we'll see a java script row hammer exploit.
Full disclosure: I had 280 char chat with Lava after I wrote this. This conversation is not part of this blog, because everything was said in private. I dare say that if this stuff interests you it would probably be worth while following Lava's tweet.

My Experiments

I played around a bit with the row hammer after my last blog. The reason that row hammer doesn't occur naturally all the time in memory is because the cache catches reads to any address being hammered. This is also why the performance counter on last level cache works well for detecting and preventing row hammering. To get around this Dullien and Seaborn (the original row hammer stuff) used the clflush instruction. I tried to hammer using the MOVNTQ instruction and was not succesful. Then after reading the "Spy in the sandbox" I started writing up a cache flush routine using multiple threads inspired by "Spy in the sandbox". The idea behind using multiple threads is that the biggest cache on modern CPU's is the level 3 cache and that is shared on all cores (and hyper threads) making it much easier to keeping the cache full with content unrelated to the row I'm hammering easier. The 1st and second level caches I don't consider too much of a problem since they quite small and could probably be kept full with unrelated stuff on a single thread. Unfortunately I never finished up this code so I'm not sure if it'd actually work for row hammer. The real issue isn't keeping my hammering row out of the cache, but doing it while hammering enough to cause bit flips.

Why row hammer in scripts would be really bad news

If Lavados and BloodyTangerine is indeed using the method I were playing with - or even a derivative of it then it's really bad. Mark Seaborn's fix with black listing clflush in the validator of NaCL sandbox would not extend because now normal mov instructions would suffice. Even adc, cmp, sub,inc,... would suffice and these are common and useful instructions. Even worse since browsers all too easy use Java script row hammer could easily be extend from a local privileged elevation to a full remote breach of the host attacked. Worse yet such an attack could be inserted automatically through man-in-the-middle on non https connections. Like the China's "Great Cannon" in a worst case scenario. You might argue that ECC ram would mitigate this, but that's only a half truth. ECC will most of the time reduce it to a DoS attack, but row hammer would from time to time flip more than 1 bit and dance right through the ECC on the ram. It wouldn't be the perfect storm because there is a tiny bit of good news, though not much: row hammer is difficult to weaponize and we know how to defeat it - even if the method is far more complex than traditional fixes.


xuanwulab (2015):
Oren et al(2015):
Fogh (2015): Row hammer fix POC and generic root kit detection using performance counters (This blog)