GPU Rigid Bodies
Introduction
PhysX rigid body simulation can be configured to take advantage of CUDA capable GPUs under Linux or Windows. This provides a performance benefit proportional to the arithmetic complexity of a scene. It supports the entire rigid body pipeline feature-set including articulations. The state of GPU-accelerated rigid bodies can be modified and queried using the exact same API as used to modify and query CPU rigid bodies. GPU rigid bodies can easily be used in conjunction with character controllers (CCTs) and vehicles.
Using GPU Rigid Bodies
GPU rigid bodies are no more difficult to use than CPU rigid bodies. GPU rigid bodies use the exact same API and same classes as CPU rigid bodies. GPU rigid body acceleration is enabled on a per-scene basis. If enabled, all rigid bodies occupying the scene will be processed by the GPU. This feature is implemented in CUDA and requires SM3.0 (Kepler) or later compatible GPU. If no compatible device is found, simulation will fall back onto the CPU and corresponding error messages will be provided.
The GPU acceleration feature is split into two components: rigid body dynamics and broad phase.
These are enabled using PxSceneFlag::eENABLE_GPU_DYNAMICS
and by setting PxSceneDesc::broadPhaseType
to PxBroadPhaseType::eGPU
, respectively.
These properties are immutable properties of the scene.
In addition, you must initialize the CUDA context manager and set it on the PxSceneDesc.
A snippet demonstrating how to enable GPU rigid body simulation is provided in SnippetHelloGRB.
The code example below serves as a brief reference:
PxCudaContextManagerDesc cudaContextManagerDesc;
gCudaContextManager = PxCreateCudaContextManager(*gFoundation, cudaContextManagerDesc, PxGetProfilerCallback());
PxSceneDesc sceneDesc(gPhysics->getTolerancesScale());
sceneDesc.gravity = PxVec3(0.0f, -9.81f, 0.0f);
gDispatcher = PxDefaultCpuDispatcherCreate(4);
sceneDesc.cpuDispatcher = gDispatcher;
sceneDesc.filterShader = PxDefaultSimulationFilterShader;
sceneDesc.cudaContextManager = gCudaContextManager;
sceneDesc.flags |= PxSceneFlag::eENABLE_GPU_DYNAMICS;
sceneDesc.broadPhaseType = PxBroadPhaseType::eGPU;
gScene = gPhysics->createScene(sceneDesc);
Enabling GPU rigid body dynamics turns on GPU-accelerated contact generation, shape/body management and the GPU-accelerated constraint solver. This accelerates the majority of the discrete rigid body pipeline.
Turning on GPU broad phase replaces the CPU broad phase with a GPU-accelerated broad phase.
Each can be enabled independently so, for example, you may enable GPU broad phase with CPU rigid body dynamics, CPU broad phase (SAP, MBP or ABP) with GPU rigid body dynamics or combine GPU broad phase with GPU rigid body dynamics.
What is GPU accelerated?
The GPU rigid body feature provides GPU-accelerated implementations of:
Broad Phase
Contact generation
Shape and body management
Constraint solver
All other features are performed on the CPU.
There are several caveats to GPU contact generation. These are as follows:
Convex hulls require
PxCookingParams::buildGPUData
to be set to true to build data required to perform contact generation on the GPU. The cooking function will ensure that the limits of 64 vertices and polygons for GPU contact generation will be respected, if the user raises thePxConvexFlag::eCOMPUTE_CONVEX
flag. Note that this may generate suboptimal convex hulls, in which case building withoutPxCookingParams::buildGPUData
raised might make more sense, but contact generation will be performed on the CPU in this case.Triangle meshes require
PxCookingParams::buildGPUData
to be set to true to build data required to process the mesh on the GPU. If this flag is not set during cooking, the GPU data for the mesh will be absent and any contact generation involving this mesh will be processed on CPU.Any pairs requesting contact modification will be processed on the CPU.
PxSceneFlag::eENABLE_PCM must be enabled for GPU contact generation to be performed. This is the only form of contact generation implemented on the GPU. If eENABLE_PCM is not raised, contact generation will be processed on CPU for all pairs using the non distance-based legacy contact generation.
Irrespective of whether contact generation for a given pair is processed on CPU or GPU, the GPU solver will process all pairs with contacts that request collision response in their filter shader.
The GPU rigid body solver provides full support for joints and contacts. However, best performance is achieved using D6 joints because D6 joints are natively supported on the GPU, i.e., the full solver pipeline from prep to solve is implemented on the GPU. Other joint types are supported by the GPU solver but their joint shaders are run on the CPU. This will incur some additional host-side performance overhead compared to D6 joints.
Performance Considerations
GPU rigid bodies can provide extremely large performance advantages over CPU rigid bodies in scenes with several thousand active rigid bodies. However, there are some performance considerations to be taken into account.
D6 joints will provide best performance when used with GPU rigid bodies. Other joint types will be partially GPU-accelerated but the performance advantages will be less than the performance advantage exhibited by D6 joints.
Convex hulls with more than 64 vertices or polygons or with more than 32 vertices per-face will have their contacts processed by the CPU rather than the GPU, so, if possible, keep vertex counts within these limits. Vertex limits can be defined in cooking to ensure that cooked convex hulls do not exceed these limits.
If your application makes heavy use of contact modification, this may limit the number of pairs that have contact generation performed on the GPU.
Modifying the state of actors forces data to be re-synced to the GPU, e.g. transforms for actors must be updated if the application adjusts global pose, velocities must be updated if the application modifies the bodies’ velocities etc.. The associated cost of re-syncing data to the GPU is relatively low but it should be taken into consideration.
Features such as joint projection, CCD and triggers are not GPU accelerated and are still processed on the CPU.
GPU Memory
Unlike CPU PhysX, the GPU simulation is not able to dynamically grow all buffers.
Therefore, it is necessary to provide pre-allocated buffer sizes for the GPU simulation, which are specified at scene creation through PxSceneDesc::gpuDynamicsConfig
.
Please refer to the API documentation for the exact meaning of the members of PxGpuDynamicsMemoryConfig
.
Corresponding default values are generally sufficient for scenes simulating approximately 10’000 rigid bodies.
If insufficient memory is available, the system will issue warnings and discard overflowing objects like contacts/constraints/pairs, which means that behavior may be adversely affected.
For tuning the pre-allocated sizes of subsequent runs, the simulation statistics (see PxSimulationStatistics
and Section Simulation Statistics) contain a PxGpuDynamicsMemoryConfigStatistics
which report the actual required values of the parameters in PxGpuDynamicsMemoryConfig for a simulation with GPU dynamics/broadphase.
Running Out Of GPU Memory
GPU memory is a limited resource. Compared to traditional system memory, there is no transparent mechanism to swap rarely used memory regions to disk to avoid allocation failures and crashes. Therefore, it is possible that memory allocations can fail inside the GPU simulation pipeline because of insufficient memory. In case of a failed allocation, PhysX will stop all GPU work submission and mark the current scene as corrupted to avoid thrashing the CUDA context and crashing the GPU. As a consequence of this, PhysX will remain usable but all scenes that have been marked corrupted have to be recreated to continue simulation.
In case of an out-of-memory event, the following will happen:
The GPU memory allocator will report an error with code
PxErrorCode::eOUT_OF_MEMORY
.The PxCudaContext in use will be transitioned to abort mode, and skip any future work submission.
PxCudaContext::getLastError()
will return CUDA_ERROR_OUT_OF_MEMORY.A lot of GPU-related errors will be reported due to the fact that GPU work will be skipped.
The next call to
PxScene::fetchResults()
orPxScene::fetchCollision()
will return an error with codePxErrorCode::eABORT
.The scene that was simulated will be marked as corrupt, and any attempt to simulate further will return an error with code
PxErrorCode::eINTERNAL_ERROR
.There is no guarantee that any data of a corrupt scene obtained through the API will be correct.
Querying whether the PxCudaContext
is currently in abort mode can be done by calling PxCudaContext::isInAbortMode()
. This can be used instead of parsing the errors, or when the PhysX CUDA context manager is used outside of the core SDK for other GPU work. In case of an allocation failure outside of the SDK, the PhysX GPU allocator will not transition the PxCudaContext
to abort mode, but the user can do so themselves by calling PxCudaContext::setAbortMode(true)
.
To reenable simulation and recover the PxCudaContext
from an out-of-memory scenario, the PxCudaContext
can be reset by calling PxCudaContext::setAbortMode(false)
. Any scene that was marked corrupt needs to be recreated, and the user needs to verify that sufficient GPU memory is available.