For shared memory programming of GPGPU systems, users either have to manually run their domain decomposition along available GPUs as well as GPU Tiles. Or leverage implicit scaling mechanisms that transparently scale their offload code across multiple GPU-Tiles. The former approach can be cumbersome and the latter approach is not always the best performing one.
The Intel MPI library can take that burden from users by enabling the user to program only for a single GPU / Tile and leave the distribution to the library. This can make HPC / GPU programming much easier.
Therefore, Intel MPI does not just allow to pin individual MPI ranks to individual GPUs or Tiles, but also allows users to pass GPU memory pointers to the library.