ICICLE is a library for ZK acceleration using CUDA-enabled GPUs.
Learn more about how to use ICICLE on our documentation website.
What’s new:
- New Golang bindings
- Multi-GPU support
- Vector operations
- Grumpkin curve support
- NTT Improvements
- MSM improvements
Read on for details.
Golang support bindings
Until recently our primary focus had been on stabilizing Rust bindings. As a result, Golang bindings were left up-to-date only for ICICLE V0.1.0. This is no longer the case!
Golang bindings are now stable and support curves:
- BLS12–377
- BLS12–381
- BN-254
- BW-6761
Expect the Golang bindings to support NTT, MSM and G2 operations. Features such as Poseidon Hash, Vector Operations, Multi-GPU and ECNTT are not supported at this time, but will be added soon.
We have also simplified the build process; simply run our script to build libraries for a specific curve.
The new Golang bindings come with the core and cuda_rutime packages, which offer generic memory, stream, and device management, similar to what you may be familiar with in the Rust bindings.
For in-depth documentation and examples, please review our documentation website.
Multi-GPU Support
With multi-GPU support it is now possible to use more than one GPU with ICICLE applications. This means you can support larger circuits, distribute different workloads across multiple GPUs, and scale your applications to a datacenter level.
Currently, multi-GPU is supported only with the Rust and C++ bindings; Golang support is in the works.
Key features:
- Device management API — To make the use of multiple GPUs as easy as possible, we offer a new API that allows you to select devices, list devices, and configure devices.
- Device per thread architecture — Each thread of execution is assigned to a separate GPU device. This model allows for performance, scalability, and ease of use while minimizing complexity.
The decision to adopt Device-Per-Thread architecture for our ICICLE applications with multi-GPU support was driven by several compelling advantages that align with our goals of performance, scalability, and ease of use. This architecture allows developers to use multiple GPUs without worrying about memory management and other complexities, since every GPU is managed in its own thread. It is also easier to debug and implement into existing codebases.
While there are more efficient architectures for multi-GPU support which may optimize GPU workloads, they are more complex to debug and use. Our multi-GPU support will evolve over time as more complex use cases require more advanced architectures.
To learn more about multi-GPU support, please read our documentation.
Vector Operations
Our new Vector Operations API allows you to subtract, add, and multiply arrays of scalars. These operations are modular operations; the modulus (p) is determined by the scalars field type automatically.
These new operations support all fields. However, they are currently only available in the Rust and C++ bindings.
To learn more about the new Vector Operation APIs and see examples of their use, please refer to our documentation.
Gumpkin Curve
The Grumpkin curve, designed for the Aztec 2.0 protocol, has been designed to work in tandem with the BN-254 curve, while offering more computational efficiency and smaller proof size.
Grumpkin is compatible with BN-254 and can be used to optimize your application without having to make significant changes by switching to a completely new curve.
The Grumpkin curve is now officially supported by ICICLE. Learn more here.
NTT Improvements
Mixed-radix NTT has received multiple updates and now supports:
- Coset support
- Batch mode — you can now use batch mode in the same way you would with Radix-2 NTT
- Support for all ordering modes has been added. For a complete list of ordering modes read our documentation.
- Fast twiddle mode — Accelerating twiddle factor computation kernels by pre-allocating 4N (N being max NTT size) twiddle factors for rapid access.
MSM Improvements
The latest updates to the MSM algorithm introduce handling of zero base points, significant performance and memory optimizations, and improvements in scalar sorting and bucket accumulation. For a more in-depth explanation, you can review the pull request comments and MSM documentation.
Benchmarking on an RTX 3090Ti reveals performance improvements in the updated MSM algorithm ranging from 1.1% to 17.4%, with significant enhancements observed particularly for larger batch sizes.
To enjoy these improvements, just make sure you are using ICICLE V1.4.0 or above.
Wrapping up
For a full list of changes, view our change log.
What’s next?
- Polynomial API
- Montgomery Multiplier API
- More small fields
- Sumcheck Acceleration
- Improving Golang bindings to support all ICICLE features.
If you are interested in testing these features pre-release or have some thoughts about design considerations, please reach out to Immanuel.
Follow Ingonyama
Twitter: https://twitter.com/Ingo_zk
Documentation: https://dev.ingonyama.com/
YouTube: https://www.youtube.com/@ingo_zk
GitHub: https://github.com/ingonyama-zk
LinkedIn: https://www.linkedin.com/company/ingonyama
Join us: https://www.ingonyama.com/career