Zach Burlingame

Month: August 2011

Setting the sticky bit recursively on directories only

This is more of a reminder for me.

Several times recently I’ve run into problems where files in a MultiUser Mercurial repository on a linux host are getting the wrong group permissions. If you properly set the group sticky bit when you first setup the repo, you won’t have this issue. To fix the issue, I needed to set the sticky bit on every directory in the .hg/store directory recursively.

find /path/to/.hg/store/ -type d -exec chmod g+s {} \;

August 31, 2011
HTTP File Download Reassembly in WireShark with Chunked Transfer Encoding

I was having problems with binaries I was downloading with a particular application the other day. As part of the debugging process at one point, I was taking packet captures with Wireshark inside the client LAN, at the client router’s WAN, and tcpdump from the server. I was then reassembling the file from the stream in each packet capture and comparing them to see where the corruption was occurring relative to the copy that resided on the server.

To accomplish this, I was going to the HTTP GET message packet in Wireshark. Then I would right-click on the packet and select Follow Stream. Next I would select only the direction of traffic from the server to the client (since this was a download). Then I would make sure RAW was selected and save the file. Finally I would open the file up in a hex editor, remove the HTTP header that winds up prepended to the file, and save it. Annnnd then the file was corrupted.

Doing a binary diff of a valid copy of the file with the reconstructed file using 010 Editor I could see that the only differences were several small sections of the file with values like these spaced throughout the file:

Hex: 0D 0A 31 30 30 30 0D 0A
ASCII: \r\n1000\r\n

and one of these at the end of the file:

Hex: 0D 0A 00 00 0D 0A
ASCII: \r\n00\r\n

I confirmed that each of the packet captures at the various points along the way all had the same result. Where the heck was this random data getting injected into my stream and better still, why?!

The first clue that it wasn’t truly random data was the \r&#92n values. Carriage Return – Line Feed (CRLF) is a staple demarcation value in the HTTP protocol. My second clue was that the values were typically 1000 and 0. Although respresented with ASCII codes in the file, if you interpret them as hex they are 4096 and 0. When doing buffered I/O a 4K buffer is very common as is getting a 0 back from a read function when you reach EOF.

As it turns out, the particular behavior I was seeing was a feature of the HTTP/1.1 Protocol called Chunked Transfer Encoding. The wikipedia article does a great job explaining it, but basically it allows for content to be sent prior to knowing the exact size of that content. It does this by prepending the size to the each chunk:

The size of each chunk is sent right before the chunk itself so that a client can tell when it has finished receiving data for that chunk. The data transfer is terminated by a final chunk of length zero.

Ah-ha! So my naïve manual file reconstruction from the Wireshark packet capture of the HTTP download was flawed. Or was it? I checked the file on disk and sure enough it too had these extra data values present.

Once again, Wikipedia to the rescue (emphasis mine):

For version 1.1 of the HTTP protocol, the chunked transfer mechanism is considered to be always acceptable, even if not listed in the TE request header field, and when used with other transfer mechanisms, should always be applied last to the transferred data and never more than one time

The server was utilizing chunked transfer encoding but the application I was using wasn’t fully HTTP/1.1 compliant and was thus doing a naïve reconstruction just like me! So, if you find yourself doing file reconstruction from packet captures of HTTP downloads, make sure you take chunked transfer encoding into account.

August 24, 2011
Getting the Load Count for a DLL

Recently I was trying to unload a DLL from a running process so that I could delete it from the disk but it just wouldn’t delete. Looking at the Modules pane in Visual Studio, I could see that the DLL was still loaded. I doubled and tripled check all of my calls to LoadLibrary for a corresponding call to FreeLibrary, and everything checked out. I needed to figure out what was loading it and where. One of the things that I wanted to know was, “What is the current load count for the DLL?”

Windows maintains a load count for each module on a per-process basis. When the load count reaches zero, the module will be unloaded. The problem is that this load count is not accessible through documented API calls. To get it, you need to use some undocumented structures and API calls from ntdll.dll. Fortunately, like so many other issues you run into, someone else has already run into it and Google knows where they are at. In this case there is a great article here (unfortunately I couldn’t figure out who specifically was the contributing author for that article so that I could give them due props).

August 17, 2011