# Scaling Instagram

• The talk was broadly themed around the idea of incremental, just-enough improvements at the infra level, and realizing when to make the next one.
• Instagram scaled to 30m MAUs from 2010-12, and to 300m MAUs by 2015 (when this talk was given).
• 2012 Stack
• 2015 Stack
• Internal policy is to have all requests return within 3 seconds.
• This seems almost too liberal; is this possibly still true?
• Initially used “gearman” as a task queue, and this served them well for 18 months.
• Once they outgrew this, they used a home-grown sharding scheme which took them a little further.
• They eventually switched to Celery + RabbitMQ.
• Gearman’s average enqueue time was 60ms at p50 and 1 second at p95.
• With RabbitMQ this dropped to 5ms at p50 and tens of ms at p95.
• Company culture of getting existing infra to work for them over switching to new infra unless absolutely necessary.
• I can see how this would be a double-edged sword, but this seems like sound advice for a small engineering team.
• Deployments started with fabric and git pull
• Eventually moved to uploading an artifact to S3, and having Fabric pull this artifact down to each machine + perform the switch via ln
• Once this grew cumbersome they built what seems to be a manual lock service, so engineers could ensure they weren’t stepping on each others’ toes.
• This eventually led to a real CD pipeline.
• “Don’t automate something until you understand it really well by doing it manually a lot”
• Their search system started with Postgres LIKE queries, which is actually not that bad for prefix searches, but pretty much O(n) for anything else.
• This gave way to a solr box that allowed more expressive searches. Lack of clustering led them to outgrow this piece of infra.
• Next up was Elasticsearch, which worked well, but was too expensive in terms of ops load for a team of their size.
• Confident that they could’ve made this work with a dedicated search team.
• Used Facebook’s internal graph database (unicorn), which allowed for very expressive s-exp based searches.
• The Explore page was entirely driven by the expressiveness of the searches they could perform.
• Started of with top-liked images globally
• Moved to things people you follow like
• Realized this didn’t work because you don’t necessarily share the same taste as everyone you follow.
• Moved to things people you liked like, which ended up working really well.

Edit