Even Scheduling on Virtual GPUs
In AI training, inference, and scientific computing, a single GPU often falls short due to limited compute power or memory. Multiple GPUs are therefore needed to work together. However, using entire GPU cards for collaboration can lead to resource wastage, especially when tasks require only a portion of the GPU's memory or compute power. Even scheduling on virtual GPUs addresses this issue by efficiently utilizing GPU memory and compute power across multiple cards, optimizing resource use.
This policy evenly allocates requested GPU resources (such as GPU memory and compute power) across multiple virtual GPUs, improving utilization and reducing waste. With this approach, a pod can flexibly use multiple virtual GPUs, with each providing an equal amount of resources. This enables refined allocation and efficient utilization of GPU resources. Even scheduling on virtual GPUs supports GPU memory isolation by configuring volcano.sh/gpu-mem.128Mi and compute-GPU memory isolation by configuring both volcano.sh/gpu-mem.128Mi and volcano.sh/gpu-core.percentage.
- GPU memory isolation: Tasks can split their requested GPU memory across multiple cards for sharing. For example, if an application requests M MiB of GPU memory and specifies to use N GPU cards on a single node, CCE will evenly allocate the M MiB of memory across the N cards. During execution, each task is limited to using M/N MiB of memory per GPU card, ensuring memory isolation between tasks and preventing resource contention.
- Compute-GPU memory isolation: Tasks can split their requested GPU memory and compute power across multiple cards for sharing. For example, if an application requests M MiB of GPU memory and T% of compute power and specifies to use N GPU cards on a single node, CCE will evenly allocate the M MiB of memory and T% of compute power across the N cards. During execution, each task is limited to using M/N MiB of memory and T/N% of compute power per GPU card.

In GPU virtualization, the allocated GPU memory in MiB must be an integer multiple of 128. Therefore, M/N, the GPU memory per card, must meet this requirement. Additionally, the compute power in percentage must be an integer multiple of 5. Therefore, T/N, the compute power per card, must be a multiple of 5.
Prerequisites
- A CCE standard or Turbo cluster of v1.27.16-r20, v1.28.15-r10, v1.29.10-r10, v1.30.6-r10, v1.31.4-r0, or later is available.
- CCE AI Suite (NVIDIA GPU) has been installed in the cluster. For details, see CCE AI Suite (NVIDIA GPU). The add-on version must meet the following requirements:
- If the cluster version is 1.27 or earlier, the add-on version must be 2.1.41 or later.
- If the cluster version is 1.28 or later, the add-on version must be 2.7.57 or later.
- GPU nodes with virtualization enabled at the cluster or node pool level are available in the cluster. For details, see Preparing Virtualized GPU Resources.
- Volcano of v1.16.10 or later has been installed. For details, see Volcano Scheduler.
Notes and Constraints
- Even scheduling on virtual GPUs is not compatible with Kubernetes' default GPU scheduling (which involves workloads using nvidia.com/gpu resources).
- Workloads with virtual GPU scheduling enabled cannot trigger auto scaling in the cluster's node pools.
Examples
The following example illustrates how to create a workload that uses even scheduling on virtual GPUs with GPU memory isolation: Configure this workload as follows: Set the number of pods to 1, the requested GPU memory to 8 GiB, and specify the workload to use two GPUs. Each GPU will allocate 4 GiB of memory. After the workload is created, CCE will automatically schedule it to a GPU node that meets these conditions.
- Use kubectl to access the cluster.
- Run the following command to create a YAML file for a workload that requires even scheduling on virtual GPUs:
vim gpu-app.yaml
The file content is as follows:
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-app namespace: default spec: replicas: 1 selector: matchLabels: app: gpu-app template: metadata: labels: app: gpu-app volcano.sh/gpu-num: '2' # Number of GPUs for even scheduling. In this example, the pod requests two GPUs, with each GPU providing 4 GiB of memory. spec: schedulerName: volcano containers: - image: <your_image_address> # Replace it with your image address. name: container-0 resources: requests: cpu: 250m memory: 512Mi volcano.sh/gpu-mem.128Mi: '64' # Requested GPU memory. The value 64 indicates that 8 GiB of GPU memory is requested (64 x 128 MiB/1024). limits: cpu: 250m memory: 512Mi volcano.sh/gpu-mem.128Mi: '64' # Upper limit of the GPU memory that can be used, which is 8 GiB imagePullSecrets: - name: default-secret
To enable compute-GPU memory isolation, configure volcano.sh/gpu-core.percentage in both resources.requests and resources.limits to allocate GPU compute power to pods, for example, set volcano.sh/gpu-core.percentage to 5.
- Run the following command to create the workload:
kubectl apply -f gpu-app.yaml
If information similar to the following is displayed, the workload has been created:
deployment.apps/gpu-app created
- Run the following command to view the created pod:
kubectl get pod -n default
Information similar to the following is displayed:
NAME READY STATUS RESTARTS AGE gpu-app-6bdb4d7cb-pmtc2 1/1 Running 0 21s
- Log in to the pod and check the total GPU memory allocated to the pod.
kubectl exec -it gpu-app-6bdb4d7cb-pmtc2 -- nvidia-smi
Expected output:
Fri Mar 7 03:36:03 2025 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.216.03 Driver Version: 535.216.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla T4 Off | 00000000:00:0D.0 Off | 0 | | N/A 33C P8 9W / 70W | 0MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla T4 Off | 00000000:00:0E.0 Off | 0 | | N/A 34C P8 9W / 70W | 0MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
The command output shows that the pod can use two GPUs, each providing 4 GiB of GPU memory (4096 MiB/1024). The pod's requested GPU memory has been evenly distributed across the two GPUs, with each GPU card's memory resources isolated.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot