Is there a word for "little optimization"? I mean, "early optimization" of code is the root of all evil, and that's kind of the same thing, but what I mean is making changes to your code and making it less logical/readable in order to do "little optimization", ie. get some gains that might as well be solved with an extra server or so?

(This post might not make sense, haven't had my coffee yet.)

# May 6, 2007

Joost are gonna have to do better if they want to be successfull. This kind of sums up my feelings, too: "Unfortunately, the content they offered was crap. The colleague who
invited me said the same - after the first amazement at the coolness of
it all, you spend about 10 minutes zapping through the programs, and
then you switch it off, bored already".

Still, no good underestimating the Skype/Kazaa guys.

# May 6, 2007

Barcamp Brussels was really cool, very Belgian and I saw a few Belgian startups I didn't know of before. Wikifonia does sheet music, and not so-so (not live yet but they have a blog) is in the get-your-friends-reviews-of-places business. I was impressed with how smart they were.

# May 6, 2007

Old but good: "The user of social software is the
group, and ease of use should be for the group."

# May 5, 2007

You could use Anguish Languish instead of Lorem Ipsum.

# May 5, 2007
"Many years ago I received a tree identification book for Christmas. I was at my parents' home, and after all the gifts had been opened I decided to go out and identify the trees in the neighborhood. Before I went out, I read through part of the book. The first tree in the book was the Joshua tree because it only took two clues to identify it. Now the Joshua tree is a really weird-looking tree and I looked at that picture and said to myself, "Oh, we don't have that kind of tree in Northern California. That is a weird-looking tree, and I've never seen one before."

So I took my book and went outside. My parents lived in a cul-de-sac of six homes. Four of those homes had Joshua trees in the front yard. I had lived in that house for thirteen years, and I had never seen a Joshua tree. I took a walk around the block, and there must have been a sale at the nursery when everyone was landscaping their new homes-- at least 80 percent of the homes had Joshua trees in the front yards. And I had never seen one before! Once I was conscious of the tree-- once I could name it-- I saw it everywhere."

More

# May 5, 2007

A great post that finally explains what Amazon's SQS service does and (most importantly) why. I get it now.

# May 5, 2007

Twitter explained in 140 characters.

"Write short 140 character messages. See your friends' messages, and receive them as SMS on your cellphone. Popular with geeks. Growing crazy"

Try to explain twitter in 140 characters.

# May 4, 2007

I'll be at Barcamp Brussels tomorrow - looking forward to it! I don't think I'll talk, there seem to be loads of good talks already. Can't wait!

# May 4, 2007

Microsoft buying Yahoo?

Microsoft opened talks again with Yahoo to potentially acquire them for 50 billion $, it seems. As we've seen before, Microsoft’s profits continue to be staggering, they could buy Yahoo at this price with the profits of less than 1 year. I don't think it'd be a fit though. All the cool people would leave Yahoo, and it would open up space for another powerhouse to emerge. What would be the pros for Microsoft:
  • They get a web savvy company
  • They get a strong brand and loads of traffic
  • They get to strengthen their position in the advertisment market
And the cons?
  • For us: Yahoo gets assimilated, which means Flickr and del.icio.us get assimilated.
  • For M$: they get lots of internal competition. Yahoo just shut down their own internal photo competition. But now Yahoo Mail and Hotmail would be under the same roof, and so on.
  • The cultural clashes might last 10+ years and be deadly.
# May 4, 2007

OK, for your wayfinding presentations, check out this:


# May 3, 2007

The memcached list is also regularly scaling porn. Today these gems:

"No clue if we’re the largest installation, but Facebook has roughly 200 dedicated memcached servers in its production environment, plus a small number of others for development and so on. A few of those 200 are hot spares. They are all 16GB 4-core AMD64 boxes, just because that’s where the price/performance sweet spot is for us right now (though it looks like 32GB boxes are getting more economical lately, so I suspect we’ll roll out some of those this year.)"

That's 3200 Gigs (!) of cached data on Facebook. At say 20Kb text per page, that's 160 million cached pages. (I know it's not entire pages they cache, but still). Of course those boxes don't have all their memory for memcached, so it'd be less. :)

# May 3, 2007

Cruxy is live now, it's a great platform for bands and such to sell their music online *themselves*. Cruxy takes care of all the technical details, hosting, transcoding, SecondLife-ing and so on.

# May 3, 2007

Michael (not a newbie, exactly) gets sucked in by tagged's signup process which tries hard to spam all your contacts (and often succeeds).

# May 3, 2007

Yahoo is launching a new web-based messenger client at http://webmessenger.yahoo.com/ - no download! Good move.

# May 3, 2007

"Since a particular source is limited to the number of times it appears in the top ten" - from the horses mouth: so on Yahoo you can only get a fixed number of pages in the top 10 of any query.

# May 3, 2007

Damn Mark Pilgrim still writes like the best. Silly season: "Reactions? “The web just got richer.â€? Well, somebody’s getting richer, but I doubt it’s gonna be the web."

# May 2, 2007

I learnt something today: sometimes, you repeat a lot of code all over, and it's NOT a good idea to put that in a separate function, coz that means you'd be abstracting away too much stuff. (But usually it is of course.)

# May 2, 2007

How I Unexpectedly Found Myself Doing IA Consulting For Startups (this is a post on my "professional" site. I haven't been able to figure out when to post here or there, any tips on that?).

# May 2, 2007
More thoughts on SilverLight (since that Digg story is sooo boring):

  • It's a runtime for Ruby, C# and Javascript, which means you can just put your js code straight into Silverlight and it will work the same, just much faster. Could be good for js-intensive apps.
# May 2, 2007

How often does the CEO of a startup fire himself? "Spending 6+ more months in development before re-entering the market is
not what I want to be doing, and as the single most expensive employee
in the company it really doesn’t make much sense to be paying me when 2
additional engineers would do the company far more in the way of value
creation."

# May 2, 2007

ok, too much silverlight hype for moi. This fox movie page: "Twentieth Century Fox and Microsoft® Silverlight™ bring a thrilling
video experience right to your web browser with this interactive player
and upcoming movie trailers. (Broadband connection recommended)." When really, it's worse than Youtube. Even old stodgy Apple trailer page is better than this.

# May 2, 2007

Silverlight does install fast, but I can't get any of the demos to work. It doesn't work with FF?

# May 2, 2007
A better link to the myspace story. And some notes:

  • The site was tested in Perl and mysql, launched in ColdFusion (!) and Microsoft SQL.
  • Friendster started having problems in 2003 (30 sec loadtimes), when Myspace launched. Good timing.
  • The network effects (users inviting other users) started to kick in about 8 months after launch (and never stopped).
  • Every profile page displays data from multiple users, hence multiple db lookups need to be used and you can't cache too easily.
  • 5 major architecture revisions.
  • When they hit 100,000s of users in 2004 and then millions of users in 2005, lots of re-architecting was needed.
  • Since 7 million users in early 2005, the architecture has been roughly the same.
  • At 500,000 users (early 2004), the single db couldn't handle the load anymore. That sounds about right.
  • They did vertical partioning (different databases for different parts of the site), but that never lasts long. Flickr uses horizontal partioning, much better.
  • After the vertical partioning, they didn't want to do all the code rewriting involved in horizontal partitioning, and decided to just get more expensive database servers. But they ended up being way too expensive for the power you get.
  • Finally,  they started to chunck their tables in chuncks of 1 million users.
  • There was still a single database that contains the user name and password credentials for all users.
  • We are early 2005 so far.
  • They switched from Coldfusion to asp.net
  • Their architecture meant that some servers where very busy and others not. 2 people full-time redistributed data between servers.
  • Spring 2005: 17 million accounts. They added caching (way too late).



# May 1, 2007

(via) Inside Myspace: how they got big while doing *all* the wrong things. The "css" feature was a mistake, because they didn't know about xss. And so on.

I think someone will write one of these days the untold story of Myspace's growth. And yes, before the viral stuff kicked in, marketing had a lot to do with it it seems.

# May 1, 2007

Amazon S3 is getting even cheaper! From an email to their developers:

"Finally, this means that we will be introducing a small request-based charge for each time a request is made to the service. Below are the details of the new pricing plan (also available on the Amazon S3 detail page):

Current bandwidth price (through May 31, 2007)
$0.20 / GB - uploaded
$0.20 / GB - downloaded

New bandwidth price (effective June 1, 2007)
$0.10 per GB - all data uploaded

$0.18 per GB - first 10 TB / month data downloaded
$0.16 per GB - next 40 TB / month data downloaded
$0.13 per GB - data downloaded / month over 50 TB
Data transferred between Amazon S3 and Amazon EC2 will remain free of charge

New request-based price (effective June 1, 2007)
$0.01 per 1,000 PUT or LIST requests
$0.01 per 10,000 GET and all other requests*
* No charge for delete requests"

# May 1, 2007

I've only worked twice with companies using .NET for web interfaces, and in both cases the UI was a disaster and the usability problems guaranteed lots of consulting hours.. Why is that? Or was that a fluke? (2 cases is hardly proof of anything) I know it's possible to make usable and elegant web UI with any technology, but does .NET somehow encourage bad UI?

# May 1, 2007

1-page websites

I realized today that the apps I use everyday are all 1-page websites. Gmail. Bloglines. Google search. Actually, Google is the king of the 1-page websites - almost all their products consist of only 1 page.

Twitter is 1-page, because there is 1 page that you spend 90% of your time on. Flickr is multipage, although it's main function (watch photos) is 1-page. Digg is 1 page. Mmm...

# Apr 30, 2007

But what bothers me about these immersive worlds (isn't there a better name?) is that they're all supercommercial. Why would that be?

# Apr 30, 2007

Damn, these "immersive worlds" in the browser are popular.

# Apr 30, 2007

You know this problem in IA when you design sites without real content, and before you know it there are loads of excerpts all over the page that don't really mean anything, ending in 3 dots "..."? It leads to a homepage like this one for example, just lots of excerpted content that doesn't really do much for anyone.

I got a word for that. Excerptitis. Maybe you have a better one?


# Apr 29, 2007

Comments work :) It's been a while!

# Apr 29, 2007

Ah, I messed up the site, but now it works again, and THE COMMENTS WORK! (excuse the all-caps)

The comments work! Yey!

# Apr 29, 2007

April was not only the hottest, the dryest and the warmest April ever in Belgium, but most likely (if there's no rain tomorrow) it will also be the first month every without any rain at all.

So far so good, life in Belgium :) The weather has been incredible.

# Apr 29, 2007

A bunch of presentations on scaling websites: twitter, Flickr, Bloglines, Vox and more.

(I changed the title because "top 10" posts are indeed sucky. Also: looking for my colombia travel site?)

By the way, here's the RSS feed of my blog, in case you'd like to subscribe.

I always love to read scaling discussions, especially about popular web apps, and there are loads of them out there. Here's my overview of the best. By the way, the best book on scaling apps I've ever read is Building Scalable Websites, by Cal Henderson (the Flickr guy).

It's dog-eared on my desk, and taught me about sharding (which I used extensively for mefeedia). Sharding is when you cut a really big table into pieces, so you can put those on separate servers. It means you have to make changes to your code, and your database isn't so database-y anymore, but it works. For example, online games use sharding to grow their virtual worlds, because there's no way they could serve all that information from 1 db cluster.

Scaling Twitter with Ruby.

Twitter is hot today, and they ran into some serious scaling problems, although the app itself is quite simple. It consists of messages of maximum 140 characters. Lessons are the same as most apps: Memcache like crazy, and optimize the database (the biggest bottleneck most of the time).

Also, Ruby on Rails scales pretty much the same way as PHP and other similar languages: shared nothing architecture. Shared nothing means that there is no 1 thing that is shared by all servers, since that would become a bottleneck.

PHP, for example, has shared nothing architecture out of the box, except perhaps for sessions, but that's easily solved by storing sessions in a db (which then has it's own scaling approach) and not in the filesystem. Here's a talk by Rasmus Lerdorf that explain scaling with PHP5. (Here's the mp3 audio recorded by Niall Kennedy).

Blain Cook made this presentation:

Scaling Flickr.

Cal Henderson wrote the above book, and also has a good presentation: Scaling Flickr slides as PDF's.

One of the problems you get into when scaling something like Flickr where you store LOTS of stuff, is that you can't just store that on a harddrive anymore: it's not big enough. Apart from just using Amazon's S3 service (which rocks - I used it for mefeedia and I know lots of startups who use it), there are other solutions. A good presentation of that by Cal is this one:

Cal (he's a busy dude) also made this presenation about scaling web apps, generally:

John Allspaw (flickr plumbr) also has a good presentation about scaling Flickr:

Scaling LiveJournal.

LiveJournal was one of the first social networks, before that word meant anything, and they've partly invented how to scale standard php/mysql/apache apps. They developed memcached, which is now used by almost anyone who wants to scale their site.

Brad Fitzpatrick has a good set of slides on how they evolved the service, here's a PDF version. And here's the slideshow embedded:

Kevin Rose mentioned this was "the bible for scaling Digg" - and I think quite a few other web apps are based on this.

Six Apart.

The livejournal guys with all their scaling expertise were acquired by Six Apart, and they soon launched Vox. And of course, here's a presentation on making Vox scalable:

Bloglines.

Bloglines' scaling problems where slightly different from your average web app, since they are an aggregator of feeds. That means they have billions of blogposts they have to keep and serve to users, and that creates its own scaling problems. The Bloglines approach was to, instead of using a database, just store all that stuff in a special filesystem. Today it'd be easier to do this since there are a few filesystems that do that, or you could just go with S3 again. Mark Fletcher (who also sold Onelist to Yahoo which is now Yahoo Groups) has given a few talks on scaling Onelist and Bloglines: here's the mp3 audio version, and here's the PDF of that talk. And a text transcript.

Last.fm

Last.fm is one of the aggregation-type apps: they gather a lot of data about what music you listen to. Similarly to Bloglines, that causes it's own scaling problems:

Slideshare.

All the slides in this post are hosted by Slideshare, an incredible service by my fellow information architect Rashmi Sinha and team. When I found out about the project, I emailed her: "brilliant and so obvious once you think of it". Like many startups, they use S3 to serve their content, and they have the obligatory yet interesting slides to explain how:

I haven't linked to lots of good thinking about scaling, or to technical resources and stuff. But the presentations should get you going in the world of memcached, perlbal, nothing shared and federation :) Enjoy!

PS: See also How I Unexpectedly Found Myself Doing Consulting For Startups (this is a post on my "professional" site. I haven't been able to figure out when to post here or there, any tips on that?).

Update: more presentations.

Another great talk in video this time, from the MySQL Bay Area Community Meetup, May 2007:

[youtube http://www.youtube.com/watch?v=Oa1guca-gFQ]

Finally, Dan Pritchett has a good presentation on scaling eBay (PDF). 26 Billion SQL queries per day! 300+ new features per quarter! 4 architecture versions since 1998 and some pretty crazy scaling of the search.

New: presentation on how Facebook uses PHP APC cache (PDF).

A talk on Youtube scalability: "In the summer of 2006, they grew from 30 million pages per day to 100 million pages per day, in a 4 month period. Thumbnails turn out to be surprisingly hard to serve efficiently. (I ran into this with mefeedia too, luckily Amazon S3 came to the rescue by then.)" Youtube uses Python, Apache, MySQL, Memcached.

NEW: Front end scaling is important too, and often ignored. Here's a good presentation from the Yahoo guys:

# Apr 29, 2007
Microsoft's profits continue to be staggering: with a quarterly revenue of $14.4 billion, it takes Microsoft only:
  • 10 hours or so (yes, hours!) to exceed Red Hat’s quarterly net income of $20.5 million.
  • four days to exceed Research In Motion’s quarterly net income of $187.9 million.
  • four days to exceed Starbucks’ quarterly net income of $205 million.
  • one week to exceed Nike’s quarterly net income of $350.8 million.
  • two weeks to exceed McDonalds’ quarterly net income of $762 million.
  • two weeks to exceed Apple’s quarterly net income of $770 million.
  • 18 days to exceed Google’s quarterly net income of $1 billion.
  • 23 days to exceed Coca-Cola’s quarterly net income of $1.26 billion.
  • five weeks to exceed IBM’s quarterly net income of $1.85 billion.
  • 10 weeks to exceed Wal-Mart’s quarterly net income of $3.9 billion.
# Apr 29, 2007

This looks like a good PHP S3 API.

# Apr 28, 2007

Google investing 250 million Euro in a huge Belgian data center.

# Apr 27, 2007

In the continuing saga of illegible domain names, I've recently purchased wayut.com and xofy.net. Once you know what they are they're actually easy to remember. 2 possible upcoming projects. Wanna guess?

# Apr 27, 2007

What's wrong with the workhack todo list: it dissapears todo items that are done. I like to see what I've accomplished, to get that feeling of satisfaction, of knowing you've done at least *something* the past 2 days.

# Apr 27, 2007

you have better odds of winning $5M in the NY
Lottery than you do of selling your company to Google (or Yahoo) - in 2005

# Apr 27, 2007

Seems that prices for good developers in Bangalore are skyrocketing. That's a good thing.

# Apr 27, 2007

Om: But there was a lesson learned: never be the me-too player in your business category.

# Apr 27, 2007

http://www.thechickentest.com/vid/IASummit2007/Information_Architecture_and_Ethical_Design.mp3

# Apr 26, 2007

http://www.thechickentest.com/vid/IASummit2007/Real_Information_Architecture_New_Mighty_Deeds.mp3

# Apr 26, 2007

http://www.thechickentest.com/vid/IASummit2007/Using_Search_Analytics_to_Diagnose_Whats_Ailing_your_Information_Architecture.mp3

# Apr 26, 2007

http://www.thechickentest.com/vid/IASummit2007/ProjectTouchstones-JessMcMullin.mp3

# Apr 26, 2007

http://www.thechickentest.com/vid/IASummit2007/Systems_Thinking_Rich_Mapping_and_Conceptual_Models.mp3

# Apr 26, 2007