Add OpenCL implementation
Sadly bottlenecked by VRAM latency due to the uncached nature of global memory on my Nvidia system and therefore only with similar performance like rust-safe.
Sadly bottlenecked by VRAM latency due to the uncached nature of global memory on my Nvidia system and therefore only with similar performance like rust-safe.