SQL for Distributed Systems
Writing SQL for the purpose of data aggregation and reporting requires a separate mindset than writing SQL for the backend of a web application. In data warehousing application we need to think of ways to efficiently backfill our data and run our SQL quickly at scale. A backfill is when we want to populate data in a table for the past X days. We need to populate data for multiple days at a time in a way that does not leak data or result in duplication. Static content can be cheaply hosted on CDNs using fast web servers like NGINX. Templates are very useful for generating SQL because they allow for parameterization and embedding python logic inside a SQL script. Having the python inside a template with a SQL extension allows your IDE to use SQL syntax highlighting, facilitates separation of concerns, and makes the database logic more discoverable using file search. The problem with updates is that they are poorly optimized in data warehouses that are designed for high throughput reads and writes.