Very misleading, likely deliberately so. Per usual it’s working - here we are talking about it.
It’s evaluating input prompt at that speed. Important metric and impressive on its own but “going X tokens per second” in the mind of anyone looking at this is generated tokens.
That number: 11 tokens per second
368 on input is impressive, 11 on output is too.
The need to taint legitimate and real progress by stretching, misleading, etc is yet another classic issue greatly exacerbated by social media.