Simplismart’s software-level optimisations enabled Llama 3.1 8B to achieve a throughput of over 343 tokens per second.
If I have 100 different users with 100 different checkpoints, 100 racks of GPU. This is not sustainable.” SambaNova is using ...