Tuesday, January 26, 2010

Failure at using COM/OLE: Part 2

Last time we just touched on a small issue, as a precursor to the important one.

Consider an application that populates an STGMEDIUM structure as follows:
  • Sets tymed to TYMED_ISTREAM
  • Sets pstm to a pointer to an IStream object
  • Sets pUnkForRelease as a pointer to the IUnknown interface of the same IStream object
  • This IStream object has a reference count of 1.
What's wrong? Let's read the documentation for ReleaseStgMedium:
When the original provider of the medium is responsible for freeing the medium, the provider calls ReleaseStgMedium, specifying the medium and the appropriate IUnknown pointer as the punkForRelease structure member. Depending on the type of storage medium being freed, one of the following actions is taken, followed by a call to the IUnknown::Release method on the specified IUnknown pointer.

Medium





ReleaseStgMedium Action
TYMED_ISTREAM





Calls IStream::Release.
(My emphasis, irrelevant parts of table omitted)

In other words, calling ReleaseStgMedium on that STGMEDIUM structure will end up in the IStream object being released twice, and unsurprisingly things like to blow up the second time.

Sad thing is it didn't really take much time to look up the problem - all of 30 seconds once I found out that freeing TYMED_ISTREAM STGMEDIUMs was the issue. You'd better allocate more time for "convincing person X that the problem exists".

Wednesday, January 20, 2010

Windows 7: SATA controller in AHCI mode and standby issues

I recently decided to switch my SATA controller into AHCI mode. With this information that was easy enough.

What followed though were "BSODs"/stop errors, mainly on resuming from standby. Sadly, minidumps weren't created for these - but the errors were either KERNEL_STACK_INPAGE_ERROR or KERNEL_DATA_INPAGE_ERROR (I didn't note which). Additionally, some applications crashed when resuming from standby with exception code C0000006 (STATUS_IN_PAGE_ERROR).

The problem apparently was KB977178 - "You receive various Stop error messages in Windows 7 or in Windows Server 2008 R2 when you try to resume a computer that has a large SATA hard disk". I noted that it only appears to update the Microsoft AHCI driver.

I have a large page file on another large non-system drive (and a small one on my system drive), so it fits the problem description (no access to the large page file until the drive spun up causing the errors). Indeed, after installing the hot fix the problem stopped.

So, just a hint in case anyone else runs into the same unfortunate issue..

Windows 7 NTFS worry...

Out of the blue, my Windows 7 install decided to run chkdsk on my system drive on startup (once). It didn't find any problems.

A little bit more digging revealed Ntfs event number 55 had been logged last time the computer was on:
"The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume [volume name]."

Obviously a little bit worrying. Since chkdsk didn't find anything actually wrong, I did a little bit of stress testing to double check for any stability issues - but those seemed fine. The drive in question is a Samsung SLC SSD, again no real sign of any issues there - S.M.A.R.T. data is OK, and actually the normalised wear-levelling attribute is still at 99 (!).

So I decided to leave it at that and wait to see if it reoccurs. (Actually, I decided to switch my SATA controller into AHCI mode which unleashed some separate problems, but I'll write about those separately).

Occur again it did - several days later. This time I caught the message in the event log before I shut down the computer. Indeed I checked the dirty flag of the volume, and it was set. Similarly, chkdsk wasn't flagging up anything as wrong.

Seemed to me that something must have changed recently to start triggering this. There were a few things I could think of, but I simply disabled the real-time protection of my anti-virus (MSE "Ongoing Beta"). Some weeks later and it hasn't reoccurred - but that may well be a coincidence. Not sure what to make of it, but I will have to continue to monitor it..

Update: The Windows photo importing functionality seems to be something that likes to trigger this (see comments). Still investigating whether MSE is relevant or not.

Update: It likes to happen when importing the photos but not always reliably. So far it has only happened with MSE real-time protection enabled. The photo importer is set to import to the "My Pictures" folder, and also picture 'streaming' is enabled. You might also get the dreaded "The file or directory is corrupted and unreadable" message as well. I will also add that I tested on Windows 7 64-bit.

Looks like an OS bug anyway, I don't see why anti-virus should cause this type of error.

Some relevant links:
http://groups.google.com/group/tortoisesvn/browse_thread/thread/69f3e36e6bbf7389?pli=1 (note the post title, easy to miss..)
http://social.technet.microsoft.com/Forums/en/w7itprogeneral/thread/6c3ed415-704b-482d-a20b-69277f6cd4ad (my god there are some idiotic replies)

First one seems interesting, some issue with file locks according to it.

Update: A statement from a Microsoft employee from the TechNet forums:
"This is a known regression in Windows 7 in the NTFS file system. It occurs when doing a superceding rename over a file that has an atomic oplock on it (atomic oplocks are a new feature in Windows 7). The indexer uses atomic oplocks which is why it helped when you disabled the indexer. Explorer also uses atomic oplocks which is why you are still seeing the issue. When this occurs STATUS_FILE_CORRUPT is incorrectly returned and the volume is marked "dirty" which is a signal to the system that chkdsk needs to be run. No actual corruption has occured.

Neal Christiansen
NTFS Development Lead"

Wednesday, January 13, 2010

People's failures at using COM

What does MSDN say about the ppvObject parameter in IUnknown::QueryInterface?
ppvObject [out]

The address of a pointer variable that receives the interface pointer requested in the riid parameter. Upon successful return, *ppvObject contains the requested interface pointer to the object. If the object does not support the interface, *ppvObject is set to NULL.

I made the relevant bit a bit more prominent.

We now have the following code, written by X:
#define COM_QI_BEGIN() HRESULT STDMETHODCALLTYPE QueryInterface(REFIID iid,void ** ppvObject) { if (ppvObject == NULL) return E_INVALIDARG;
#define COM_QI_ENTRY(IWhat) { if (iid == IID_##IWhat) {IWhat * temp = this; temp->AddRef(); * ppvObject = temp; return S_OK;} }
#define COM_QI_END() return E_NOINTERFACE; }

COM_QI_BEGIN()
COM_QI_ENTRY(IUnknown)
COM_QI_ENTRY(IDataObject)
COM_QI_END()
This expands to:
HRESULT STDMETHODCALLTYPE QueryInterface(REFIID iid,void ** ppvObject)
{
if (ppvObject == NULL) return E_INVALIDARG;
{ if (iid == IID_IUnknown) {
IUnknown * temp = this; temp->AddRef(); * ppvObject = temp; return S_OK;} }
{ if (iid == IID_IDataObject) {
IDataObject * temp = this; temp->AddRef(); * ppvObject = temp; return S_OK;} }
return E_NOINTERFACE;
}

Does that look like it sets *ppvObject to NULL on failure?

More on the general subject here.