Some SQL Tricks of an Application DBA

A couple of tips related to database-centric (web)app development. Most of these were fairly basic, but one stood out in particular: the order in which data is inserted affects the on-disk ordering (and therefore the indexes chosen, and therefore performance). Postgres captures this notion using the correlation statistic:

Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk.

Other highlights:

Back then, before AWS RDS, Azure, Google Cloud and the rest of them cloud services, there were two types of DBAs:

  • The Infrastructure DBA was in charge of setting up the database, configuring the storage and taking care of backups and replication. After setting up the database, the infrastructure DBA would pop up from time to time and do some “instance tuning”, things like sizing caches.
  • The Application DBA got a clean database from the infrastructure DBA, and was in charge of schema design: creating tables, indexes, constraints, and tuning SQL. The application DBA was also the one who implemented ETL processes and data migrations. In teams that used stored procedures, the application DBA would maintain those as well.

Application DBAs were usually part of the development team. They would possess deep domain knowledge so normally they would work on just one or two projects. Infrastructure DBAs would usually be part of some IT team, and would work on many projects simultaneously. Until this day I like to say I’m a DBA that knows how to develop applications, and not a developer that knows his way around the database.

  • Update Only What Needs Updating
  • Disable Constraints and Indexes During Bulk Loads
  • Use UNLOGGED Tables for Intermediate Data
  • Avoid Indexes on Columns With Low Selectivity
  • Use Partial Indexes
  • Always Load Sorted Data
  • Index Columns With High Correlation Using BRIN