Microsoft's Sidekick/Pink problems blamed on dogfooding and sabotage

mrkurt · on Oct 13, 2009

I think that's the most twisted definition of "dogfooding" I've ever heard.

Also, this article seems to be about 98% speculation from a site that has no previous history of "inside sources" at Microsoft. Boo on that.

jay_kyburz · on Oct 13, 2009

Yeah, I came in here to say that in my world "dogfooding" means using your own application during development. To eat your own dog food.

Timothee · on Oct 13, 2009

I agree as well on that definition. Which is what brought me to start reading the article since I was really curious to know how Microsoft engineers using Sidekicks could have caused any issue.

tvon · on Oct 13, 2009

That's what came to mind for me as well, though I've never heard the term "dogfooding" before, just "eating our own dogfood" or something along those lines.

wglb · on Oct 13, 2009

Right, but the article seems to describe a culture that uses dogfooding to an extreme and rewriting/replacing software systems of acquired companies to use MS technologies.

karzeem · on Oct 13, 2009

Doesn't Google do the same thing, at least with a lot of its smaller acquisitions? Job #1 is rewriting the thing with Google's usual set of tools?

(Not to suggest that Google doing it makes it right or wrong.)

pyre · on Oct 13, 2009

I think the point is that Microsoft has a history of failures resulting from rewriting things in MS tech. At some point you just have to admit your weaknesses and deal with them rather than living in some sort of management-denial.

wizzard · on Oct 14, 2009

This.

I love how the author takes the fact that Microsoft/T-Mobile haven't issued an official statement on the cause of the outage, and decides that it therefore MUST have been caused by an elaborate and sophisticated act of sabotage by some ne'er-do-well engineer.

QED.

tlb · on Oct 13, 2009

Should there be a word? "NIH mentality" is close, but not exactly what's claimed in the article.

moonpolysoft · on Oct 14, 2009

The section on why danger failed within MS and the section on dogfooding are mostly true in my experience, having worked for an acquired company for almost a year now within MS. There is an unnatural compulsion within the company to expel foreign technologies, especially those of competitors.

Inevitably acquired companies will wind up porting their software to whatever insane crap their business unit is using as a development stack. We're about to start going through that pain right now.

tlb · on Oct 13, 2009

I don't think it's fair to rule out the possibility of a simple screw-up. I've been responsible for important web services and I took a lot of precautions, but I can't say it was inconceivable that something could have gone wrong causing major permanent data loss. There are so many ways in which failures can cascade in a complex system there can be no absolute guarantees.

Can anyone quantify how much address book data there was? As an outside guess, 1,000,000 customers times 1000 addresses times 300 bytes each is only 300 GB. They could have kept an emergency backup on rsync.net for $300 / month. I keep secondary, well-encrypted backups of things I particularly care about (not nearly 300 GB worth) there, as well as on a USB disk under my bed, in addition to the regular complete backups.

pyre · on Oct 13, 2009

> There are so many ways in which failures can cascade in a complex system there can be no absolute guarantees.

What about off-site backup? How many failures can cascade in a way to wipe out off-site backups?

domodomo · on Oct 14, 2009

Totally, there was a line in the article about a "time bomb erasing everything including the tapes". Well...you leave all the tapes in you library, nothing gets offsited?

Or not even offsited, it's not like the building burned down. Just tapes not left in the library.

The theory that they might have the data but just don't know how to restore things because all the Danger people are gone, now that actually seems somewhat more credible to me.

tlb · on Oct 14, 2009

Many backup systems copy the data automatically each night, but if the data is damaged on the main system and nobody notices right away it can overwrite the backup. Or, in the stress of late-night recovery sessions, someone can copy things in the wrong direction and overwrite the backup. Sometimes the backups are encrypted with a highly secure key, and the key is lost with the original failure, or stored within the encrypted backup like keys locked inside a car.

Because SANs allow multiple machines to write to the same disk, they have byzantine failure modes where data can be overwritten on disk, but cached for long periods of time by the machines that need it so there's no visible problem until after a power failure.

When data is migrated to new database hardware, sometimes the backups haven't actually been tested. Sometimes during the cutover, someone who didn't fully understand the architecture backs up the new (empty) machine on top of the old machine's backups, so there is no backup for a while. All these mistakes can happen pretty easily when sysadmins are woken up in the night to fix a service outage while people are yelling at them. I have made some of these mistakes myself, though so far I've been lucky.

The computer industry has never learned to admit that there is always some level of risk. You won't hear an oil company exec say "Inconceivable!" when an iceberg crashes into a North Sea oil rig, or a refinery catches on fire. Systems fail, and all you can do is try your best to avoid it and move forward after it does happen.

pyre · on Oct 14, 2009

A lot of those possibilities are preventable though. If you backup through an automated rsync, why not used versioning (ala rdiff)? Why not periodically swap out the disks on the backup machine so that if the disk gets hosed you have some data backed up, even if it is old data.

Some of the things like 'locking the keys in the car' can be mitigated by making sure that you test you backup system.

Obviously you can't prevent everything, but to say that some of these issues are 'unavoidable risk' is like saying that an oil company exec would say, "shit happens," to reports that one of their oil tanker captains crashed a ship while drunk.

The real problem is that backup is usually an after-though.

rit · on Oct 13, 2009

Good article.

Once upon a time, I wrote a series of opensource iSync plugins for the sidekick/hiptop platform, back when Danger first opened up their XML-RPC Service.

Then it came out that in order to run in "production" (e.g. on a phone without a developer provisioning loaded) your app had to have t-mobiles permission. We kept repeatedly getting told "no" on the iSync plugin, that they had no interest in supporting Mac. Then of course a commercial plugin appeared and I got fed up and stopped trying, getting the clear impression that commercial was what T-Mobile wanted.

Mostly however, it was the shock in realizing that you had no access to your own data as a standard user. You were completely and utterly locked in to that device, with no alternative.

ajg1977 · on Oct 14, 2009

Are you kidding? It's a terrible article that consists of nothing but speculation and conflicting segments that attribute the data loss to "dogfooding", and/or sabotage, and/or aggressive non-beneficial firmware updates, and/or incompetence.

Well done AppleInsider, you managed to nail the problem simply by covering every possible base.

rit · on Oct 15, 2009

Fair enough. In hindsight: - I'm at the point where my brain mostly parses out all the speculation crap, I come across it so often. There was good tidbits in there covering some of the contract issues, etc. - I think the "Good article" was almost a (can't think of the word[s] I'm looking for, but something you say out of habit/reflex without thinking). It kind of came out without actually considering other than "I found interesting things in it". I apologize, you are in fact correct that it was incredibly speculative.

jsz0 · on Oct 14, 2009

Microsoft's insistance on rebuilding with their own technologies must put them at a big competitive disadvantage. Google can buy up just about any small web company and have 100% code compatibility from day one.

blasdel · on Oct 14, 2009

That's not true in the slightest, unless that "small web company" is using App Engine.

Microsoft has a hardon for dev-managed directly-addressed OS instances running on x86 machines, so you could at the very least migrate to their extant hosting infrastructure. Even if you have to rewrite your app in C# + SQL Server, it's still going to be a direct gloss for most traditional webapps.

Google does no such thing -- everything is massively distributed at every level, where blocks of infrastructure are managed as ideal services independently from any application, and addressed at the datacenter level. You are not going to get to manage your own machines, access a traditional filesystem, use relational databases, a direct socket to the client, direct access to internet hosts, basically anything you take for granted.