Rust Stream API visualized and exposed

withoutboats3 · 2024-04-25T15:44:39

The problem here isn't with the concept of Streams (which are good) but with specifically the "buffered stream" APIs provided by the futures crate (i.e. the buffered and buffer_unordered methods). Their lack of concurrency with processing before or after is a known problem as the blog post alludes to at the end; I would discourage users from using these APIs without considerable care.

I've explored this subject on my blog, including possible solutions to the problem with these APIs:

https://without.boats/blog/futures-unordered/

https://without.boats/blog/poll-progress/

https://without.boats/blog/the-scoped-task-trilemma/

Also, the rendering and visualization aspect of this is very cool!

reply

gdcbe · 2024-04-25T16:07:38

Please keep on blogging like you do withoutboats, your articles are a gem that I learn something new from every time.

Due to the work of you and others I do have hope it will all be better in future.

That said, might be my low standards due to many scars from my c++ background, but I’m already plenty happy with what we have today, so the fact that it will get even better in the next years is like cherry on the cake for me.

reply

bionhoward · 2024-04-26T10:54:42

One thing I’m struggling with is finding blog posts which show how to do async now, as opposed to ideas for how async could be improved or done differently. Where’s the best pragmatist’s guide to async rust in 2024?

For example, do you think it’s better to write “async fn” (call this the high level api) and make them fairly small and contained or is it better to impl Future and use enum Poll for low level control of higher level abstract computations (I.e. the one and only point of Async is “Pending”)?

(Hopefully I made sense) — high level API on low level components, or low level API on high level components, or something else?

is a future a structure or a function or both (trajectory?)

I find the difference between concurrency and parallelism is too subtle to be really satisfyingly / obviously accurate or useful. Might there be a better way to separate or rename these concepts to better convey how they are different?

There are more questions, I wonder if you can show us how to write runtime agnostic async code today (admitting the ecosystem has holes, and setting aside how to solve every problem with async rust, how do we practitioners right now future proof our async code to avoid getting overfit / stuck on the details of the Tokio runtime?)

Sry for long comment, no worries if you’re busy, just ideas for questions to optionally write about

reply

John23832 · 2024-04-26T12:38:59

> For example, do you think it’s better to write “async fn” (call this the high level api) and make them fairly small and contained or is it better to impl Future and use enum Poll for low level control of higher level abstract computations (I.e. the one and only point of Async is “Pending”)?

You would almost never implement a future unless you needed that specific future for a purpose

> is a future a structure or a function or both (trajectory?)

It is a structure that implements certain traits.

> I find the difference between concurrency and parallelism is too subtle to be really satisfyingly / obviously accurate or useful. Might there be a better way to separate or rename these concepts to better convey how they are different?

The difference between concurrency and paralellism is way bigger than rust. I'd suggest just learning the concept.

> There are more questions, I wonder if you can show us how to write runtime agnostic async code today (admitting the ecosystem has holes, and setting aside how to solve every problem with async rust, how do we practitioners right now future proof our async code to avoid getting overfit / stuck on the details of the Tokio runtime?)

There is no runtime agnostic way to write things in rust because the runtim handlers are different. When a future can be make progress, the runtime needs to handle there internals. This means that crates are typically written using the core structures of a single runtime. The only thing that is "guaranteed" in rust is what a Future "can do" and the async/await interface.

As for changing runtimes, if you don't know enough to discern between runtimes, you should just use Tokio. It's the standard. You wont need to change.

reply

zamalek · 2024-04-25T17:14:49

Is there any reason why one couldn't?

    stream::iter(0..10)
      .map(async_work)
      .map(|t| spawn(t))
      .buffered(3 - 1) // The line above act as a buffered slot
      .map(unwrap_join)
      .filter_map(async_predicate);

demurgos · 2024-04-25T22:24:34

The poll_progress post linked above explains the situation. When polling the overall stream, you alternate between awaiting in the buffered interface or in the subsequent adapters. This is because the different futures are not peers with regard to the executor, but there's a chain of futures and `FilterMap` only calls `poll` on its parent when it's done with the current item.

This post was also helpful to understand the issue: https://tmandry.gitlab.io/blog/posts/for-await-buffered-stre...

reply

zamalek · 2024-04-26T20:09:44

I do understand the problem, I'm curious if spawn would resolve it.

spawn (in the major executors at least) behaves as though it spawns a new thread to poll on. Therefore work does get underway even if callsite poll is never called, and the callsite only checks the status of the thread/task.

https://tokio.rs/tokio/tutorial/spawning

reply

zokier · 2024-04-26T13:05:09

Isn't this the whole reason for ConcurrentStream, its supposed to make async streaming more ergonomic and less errorprone?

[1] https://github.com/yoshuawuyts/futures-concurrency/releases/...

reply

0x457 · 2024-04-25T17:13:13

So, buffer and buffer_unordered only make sense at the end of stream and only if the receiving side is slower than the rest of the "pipeline"?

qwertox · 2024-04-26T09:06:38

Not really, because items in the pipeline won't be able to get processed as long as the receiving side is not yielding.

low_tech_punk · 2024-04-25T16:30:17

Good visualization is worth a thousand words! I wonder if Rust stream can contain streams themselves, i.e. higher order streams as in seen in RxJS? I found it very difficult to visualize anything that is of higher order. The RxJS marble diagram was helpful to some extent but they are static.

macawfish · 2024-04-25T16:48:54

Yes, higher order streams are possible in Rust. I appreciate that in Rust they are also typed. In JavaScript it's sometimes tricky to reason about higher order streams without types.

renewiltord · 2024-04-25T16:50:18

This is an incredible animation engine. I'm going to check it out.

roland35 · 2024-04-25T16:59:22

Bevy is a full-blown game engine, which is an awesome idea for visualizing rust programs. Maybe it would be good for generating advent of code diagrams next year... (Who am I kidding, I barely get to day 12 most years!)