The Cerebras CS-2 wafer-scale engine (850k cores, 40 GB SRAM)
With 850,000 cores on one chip, CS-2 delivers cluster-scale speedup without the communication slowdowns that come from parallelizing work across a massive cluster of devices. One chip in one system means no distributed training or parallel computing experience needed. CS-2 makes massive-scale acceleration easy to program for. On CS-2, you can deploy large inference models in a real-time latency budget without quantizing, downsizing, and sacrificing accuracy. Every detail of the CS-2 system - from power and data delivery to cooling to packaging - has been carefully engineered to drive the colossal WSE-2. Multiple CS-2s can be clustered together for even greater scale and performance, resulting in greater deployment ease, lower engineering costs, and more flexibility. Even the most ambitious, extreme-scale deep learning applications require far fewer CS-2 systems to achieve the same effective compute as large-scale clusters of traditional small devices.