Figure 4. Performance of the new batched offload implementation that now rivals the legacy CUDA version. Shown is relative throughput vs. number of atoms for a broad range of problems.
Figure 4. Performance of the new batched offload implementation that now rivals the legacy CUDA version. Shown is relative throughput vs. number of atoms for a broad range of problems.