In the Windows API, there’s a function called Shell_NotifyIcon() which is used to manage the tray icon in your program. All my programs, like SuperF4 and AltDrag, have a tray icon — also the only UI they expose. As the number of users for my programs have grown, the number of bug reports regarding this function have increased. The problem with this API is that if (when) the Windows shell (explorer.exe) for some reason isn’t responding, the API call will fail. As a result, your tray icon won’t get updated, or worse yet, it won’t be added in the first place.
The first time I received a bug report about this was in December last year, almost exactly one year ago. A user was using AltDrag and complained that tray icon was not appearing when he started his computer, and that an error message mentioning Shell_NotifyIcon() appeared instead. After some googling, I found out that this most often happened on slow computers with antivirus software installed. The antivirus program slows down the computer so much so that explorer.exe hasn’t fully initialized when the autostart programs try to add their tray icons. I verified that the user had an antivirus program installed, and we managed to work out a workaround for the problem. The workaround simply tried to add the tray icon at least five times before giving up (code).
This worked great, and there were much rejoicing. It worked great until April this year, at which time I got two bug reports in rapid succession (one day between them). At this time I was using Vista and I had even encountered the error message myself, even though I have a fast computer and I don’t use an antivirus program. I eventually managed to track down why the retry code wasn’t working — it was because the code only tried several times when it was adding the tray icon — it didn’t retry when was updating the tray icon. Somehow the tray icon was added successfully, and then when the program tried to update the tray icon, explorer wasn’t responding and the update failed. For readers unfamiliar with my programs, they first initialize in a disabled state (and adds the tray icon), they then enable themselves (hook with the keyboard etc.), and finally updates the tray icon to represent the enabled state. Another fix was made to make the code always retry, no matter if it was trying to add or update the tray icon, and a delay of 100 ms was added between each retry (code).
This worked great, and there were much rejoicing. It worked great until a few months later, when bug reports about this started rolling in again. Another fix was made which increased the number of retries to 100. Combined with the 100 ms sleep, this means that the code should retry to add/update the tray icon for at least 10 seconds. This seems to have solved the problem for everyone.
I am now using Windows 7 and I have noticed that most people who reported the error also used Windows 7. It seems that Microsoft have worsened the situation with this release. I have also noticed that some of my other programs on autostart fail to add their tray icon sometimes (psst.. Xfire developers). The reason is this bug! Yes, it is a bug, but it’s not a bug in the program — it is really a bug in Windows. These programs don’t retry when they add their tray icon, they probably don’t even check if it was added successfully. I can understand why: This isn’t something a programmer should have to worry about at all, and the workaround is horrible and unreliable. The only reason almost no one knows about this bug is because almost no programs display an error message if their tray icon failed to add. Does your favorite app fail to add its tray icon on boot? That is because of this bug!
I have not updated all of my programs with latest fix yet. So far only AltDrag has the latest fix in its stable version, and SuperF4 has a hotfix released separately on its website. I will have to roll out a new version for all my programs — a version which only works around a bug that exists in the operating system.
My recommendation to developers: Add code that retries when you do any kind of operation on the tray icon, include a delay between each try, and try at least for 10 seconds before giving up. Also take a look at my code.
All of this touches on a much more deep and fundamental problem: Windows is unable to evolve. We are stuck with this problem because of application compatibility. Microsoft is afraid to trip on the wrong bit and cause mayhem with older applications, in which case they will eventually lose money. That is why Microsoft is unable to fix this bug. If a bug like this existed in an open operating system, it would simply be fixed, at the core, and the community would make sure that all applications are updated to work properly. This is impossible in Windows, and I promise you that we will still work around this bug in ten years from now. I guess that when Windows 8 is released, we will just have to increase the number of retries to 200.