Wed 7 Jan 2009
A couple of years ago, I worked on a service that was supposed to help you chat live with other folks who were looking at websites on topics similar to yours. We were really concerned that a lot of people exchanging messages in real-time would overwhelm our servers, so we spent a lot of time developing a highly scalable back-end. I think we did a really good job of it: we used some technology that had handled similar challenges quietly and reliably for many years, and went through several informative rounds of load testing and tuning until we felt comfortable handling tens of thousands of users.
We never got tens of thousands of users. The engineering effort was a fun challenge, and quite educational, for me and my colleagues. But it turned out the product didn’t attract enough users to warrant any scalability concerns at all. There were much more elemental problems, like the way you installed it and the initial marketing work to get it off of the ground. All things we engineers earnestly hoped and believed were not our responsibility, because they didn’t feel like engineering, and weren’t amenable to our preferred methods (i.e. writing lots of good code). In other words, we were sunk by problems we never even considered.
The problem is that we’re surrounded with information that confirms the threats we recognize. There are all those competitors that flubbed it because their engineers weren’t as smart. They got themselves Techcrunched and crashed under the pressure. Or they had some great feature but made it too hard to use.
The flaw in this sample, though, is its survivorship bias. The failures we hear about represent only failures big enough to get mentioned in the news. The more common failure might be the two guys in their garage that never manage to get out of their garage. Or the group of smart engineers that spend too much time working on the problems that are amenable to good code rather than the problems that end up being important to users. Or to the people who would have been users if you had actually done something they cared about!
So, what does the canonical advice formula give us when applied to this story? I think it’s pretty simple: try to fail early. Have the courage to ignore concerns that won’t really threaten you for a few months, because there are other, more pressing things that will hit you first — you just don’t happen to know what they are. They’re probably things you just overlooked: nuances of building stuff people want, or appealing to the right audience, or hiding your best features. If you realize those failures before you’ve spent two months working on scalability, you probably have a better chance of turning things around before everyone gets demoralized.
Follow-up: some great points in the discussion on Hacker News.