Why SQLite Performance Tuning Made Bencher 1200x Faster

bastawhiz · 2024-04-18T20:50:54

It's a little ironic that a startup that bills itself as

> Detect and prevent performance regressions before they make it to production with continuous benchmarking

Would ever write

> I got feedback from a user that their Bencher Perf Page was taking a while to load. So I decided to check it out, and oh, man were they being nice.

I get that this is an edge case, but it doesn't feel awesome that they had such a painful performance issue that went undetected!

reply

epompeii · 2024-04-20T13:29:16

Definitely ironic and embarrassing!

I figured I could either hide in shame or share some hard learned wisdom, and I opted for the latter.

reply

pizza234 · 2024-04-18T08:25:26

1. clickbat (reminds patent descriptions ;)); the article itself is about routine query optimization

2. there is no "materialized view" - the view created is a standard one; subquery materialization != materialized views. see wikipedia article (https://en.wikipedia.org/wiki/Materialized_view) to find what materialized views are

chx · 2024-04-18T09:05:56

That bonus bug could be read as the perfect case to argue against using an ORM

> .left_join(schema::alert::table.on(view::metric_boundary::boundary_id.eq(schema::alert::boundary_id.nullable())))

instead of

> LEFT OUTER JOIN alert ON (alert.boundary_id = metric_boundary.boundary_id)

klabb3 · 2024-04-18T15:14:27

Is that from an orm? Looks like a query builder to me.

The main thing you left out is the type conversions with plain sql. So a fair comparison would include the e2e flow, imo.

That said, your point is very much valid because of the nested parentheses - I think the main issue is that SQL is an “infix”(?) dsl? It’s strange because Rust has perfectly good proc macros for this type of stuff.

reply

epompeii · 2024-04-20T13:38:45

If anything, this experience has made me appreciate Diesel even more!

Having strongly typed SQL has prevented these sort of bugs from happening at compile time. The only reason I ran into trouble here was that I went into `unsafe` land without proper precaution. That's on me, not the ORM.

reply

orthoxerox · 2024-04-18T09:13:44

metric_boundary.as("m").join(alert.as("a"), $"a.boundary_id" === $"m.boundary_id")

Scala is sometimes dangerously syntax-sweet, but Spark does some things right.

homarp · 2024-04-17T20:20:19

Bencher is a suite of continuous benchmarking tools. https://github.com/bencherdev/bencher

speed_spread · 2024-04-18T12:00:54

Any story about making something 1200x faster is going to be about something being 1197x too slow to begin with. It's like celebrating a performance failure.

epompeii · 2024-04-20T13:47:20

I think it was definitely unoptimized to begin with. The major slow down didn't happen until the model evolved though, as I mention in the Background section. Not catching that change was certainly a performance failure!

sharpshadow · 2024-04-19T13:38:08

Exactly! If one can optimise for 1200x it was bad from the start.

chmaynard · 2024-04-18T07:22:18

Nice writeup!

I much prefer using SQLite via its CLI (sqlite3) as opposed to embedded SQL statements via an API. Much easier to read and understand what's going on.

magicalhippo · 2024-04-18T11:41:06

The culprit for the slowdown is this section, reformatted to be readable:

    LEFT OUTER JOIN (
      boundary 
      INNER JOIN threshold ON (threshold.id = boundary.threshold_id)
      INNER JOIN model ON (model.id = boundary.model_id)
      LEFT OUTER JOIN alert ON (alert.boundary_id = boundary.id)
    ) ON (boundary.metric_id = metric.id)

From what I can gather, it does a full scan on "boundary" table so it can do the inner joins before doing the left outer join against "metric".

The author then introduces the "metric_boundary" view which does the left outer join between "metric" and "boundary". However the SQL using the "metric_boundary" view is not shown, and from the query planner results I can't see that the new SQL is equivalent?

That is, without the "metric_boundary" view, one immediate thought on how to optimize the original query is to avoid doing the "threshold" and "model" inner joins, instead flattening the query:

    LEFT OUTER JOIN boundary ON (boundary.metric_id = metric.id)
    LEFT OUTER JOIN threshold ON (threshold.id = boundary.threshold_id)
    LEFT OUTER JOIN model ON (model.id = boundary.model_id)
    LEFT OUTER JOIN alert ON (alert.boundary_id = boundary.id)

This should allow the database to just search on the primary keys, similar to the final result in the article. However, this is not equivalent. It will include values from "boundary" even if there are no matches for "threshold" and "model", unlike the original query. In the original, "boundary" rows will not be returned unless there's also linked rows in "threshold" and "model".

However, from the query planner results after introducing the "metric_boundary" view, from what I can see the author has in effect done just the same kind of flattening?

    |--SEARCH boundary USING INDEX sqlite_autoindex_boundary_2 (metric_id=?) LEFT-JOIN
    |--SEARCH threshold USING INTEGER PRIMARY KEY (rowid=?) LEFT-JOIN
    |--SEARCH model USING INTEGER PRIMARY KEY (rowid=?) LEFT-JOIN

Am I missing something here? Did the author not really need to reject "boundary" table rows which didn't have any corresponding rows in "threshold" and "model" table, ie didn't really need the inner joins, and didn't realize this?

larschdk · 2024-04-18T09:51:40

I thought this would be about optimizing or tuning SQLite settings, but it's about query and schema optimization.

zb3 · 2024-04-18T09:13:39

Are there materialized views in sqlite?

stefanos82 · 2024-04-18T09:35:07

I guess..? :thinking: https://sqlite.org/lang_with.html#materialization_hints

zb3 · 2024-04-18T10:33:07

but it seems it's for CTEs, I was asking aboyt materialized views where their result is physically stored all the time

eddd-ddde · 2024-04-18T14:33:31

Not like in postgres, but you can do stored computed values.

fforflo · 2024-04-18T10:59:25

Not really. In this context they're a hint to the optimizer to evaluate the CTE result store=materialize it somewhere and use that snapshot when the CTE is queried subsequently.

The alternative means the CTE is copy-pasted essentially and them evaluated.

The closest one can get to materialized views in SQLite is probably with CREATE TABLE AS. Drop and recreate.

epompeii · 2024-04-20T14:18:57

I think you are totally right!

The VIEW itself is not a "materialized" view, but when the (normal) view is instantiated within a single query it is treated as MATERIALIZED because it is used multiple times in the query execution: https://www.sqlite.org/lang_with.html#materialization_hints

reply