The backend of a global visual effects pipeline

  • Show → Sequence → Shot → Task
    • Different VFX studios bid for specific shots; multiple studios do work for a single movie
  • Render Farm
    • A “Manager” accepts jobs, provides status updates, and manages resources efficiently.
    • Their Vancouver farm has 50k+ cores and 150TB RAM in total
    • A Netflix project they worked on required 120h/frame on a single node.
  • Maya / Houdini / Nuke
    • All software exposes a Python interface
    • They use Python to glue things together
    • People use C++ to write lower-level things like shaders
  • Asset system
    • Keep track of all assets used by artists
    • Versioning, metadata, dependencies
  • Developer machines have ~100GB of RAM, and artists still can’t use their machines if they’re running a render on their own machines.
  • Python 2 is the standard but moving to 3 by 2019.
  • Parts of the pipeline are Python + CLI tools like ffmpeg.
  • Disk
    • NFS is commonly used.
    • Their NFS cluster has 21 nodes and totals 4PB.
    • Each node has 100GB RAM and a 1.5TB SSD cache.
    • Not a good fit to transfer data across regions (Vancouver → Melbourne, for example)
      • NFS-over-WAN is error-prone and slow.
      • Use a custom crawl+transfer job that runs on-demand instead
  • Use REZ to manage different versions of installed software based on the shot.
  • Abstract services to keep the pipeline humming while improving efficiency on the backend
    • Every one of these services (like the asset system and the render farm) exposes an API
    • The API node runs consul, logstash, and zabbix
    • Plenty of distributed systems challenges here!
    • RabbitMQ to keep geographically distinct asset pools in sync.
    • Typically a master in one region + read replicas in others.
    • Evaluating CouchDB multi-master.
    • Ansible for provisioning.
Edit