GPU Rigid Bodies

Introduction

PhysX rigid body simulation can be configured to take advantage of CUDA capable GPUs under Linux or Windows. This provides a performance benefit proportional to the arithmetic complexity of a scene. It supports the entire rigid body pipeline feature-set including articulations. The state of GPU-accelerated rigid bodies can be modified and queried using the exact same API as used to modify and query CPU rigid bodies. GPU rigid bodies can easily be used in conjunction with character controllers (CCTs) and vehicles.

Using GPU Rigid Bodies

GPU rigid bodies are no more difficult to use than CPU rigid bodies. GPU rigid bodies use the exact same API and same classes as CPU rigid bodies. GPU rigid body acceleration is enabled on a per-scene basis. If enabled, all rigid bodies occupying the scene will be processed by the GPU. This feature is implemented in CUDA and requires SM3.0 (Kepler) or later compatible GPU. If no compatible device is found, simulation will fall back onto the CPU and corresponding error messages will be provided.

The GPU acceleration feature is split into two components: rigid body dynamics and broad phase. These are enabled using PxSceneFlag::eENABLE_GPU_DYNAMICS and by setting PxSceneDesc::broadPhaseType to PxBroadPhaseType::eGPU, respectively. These properties are immutable properties of the scene. In addition, you must initialize the CUDA context manager and set it on the PxSceneDesc. A snippet demonstrating how to enable GPU rigid body simulation is provided in SnippetHelloGRB. The code example below serves as a brief reference:

PxCudaContextManagerDesc cudaContextManagerDesc;

gCudaContextManager = PxCreateCudaContextManager(*gFoundation, cudaContextManagerDesc, PxGetProfilerCallback());

PxSceneDesc sceneDesc(gPhysics->getTolerancesScale());
sceneDesc.gravity = PxVec3(0.0f, -9.81f, 0.0f);
gDispatcher = PxDefaultCpuDispatcherCreate(4);
sceneDesc.cpuDispatcher = gDispatcher;
sceneDesc.filterShader  = PxDefaultSimulationFilterShader;
sceneDesc.cudaContextManager = gCudaContextManager;

sceneDesc.flags |= PxSceneFlag::eENABLE_GPU_DYNAMICS;
sceneDesc.broadPhaseType = PxBroadPhaseType::eGPU;

gScene = gPhysics->createScene(sceneDesc);

Enabling GPU rigid body dynamics turns on GPU-accelerated contact generation, shape/body management and the GPU-accelerated constraint solver. This accelerates the majority of the discrete rigid body pipeline.

Turning on GPU broad phase replaces the CPU broad phase with a GPU-accelerated broad phase.

Each can be enabled independently so, for example, you may enable GPU broad phase with CPU rigid body dynamics, CPU broad phase (SAP, MBP or ABP) with GPU rigid body dynamics or combine GPU broad phase with GPU rigid body dynamics.

What is GPU accelerated?

The GPU rigid body feature provides GPU-accelerated implementations of:

Broad Phase
Contact generation
Shape and body management
Constraint solver

All other features are performed on the CPU.

There are several caveats to GPU contact generation. These are as follows:

Convex hulls require PxCookingParams::buildGPUData to be set to true to build data required to perform contact generation on the GPU. The cooking function will ensure that the limits of 64 vertices and polygons for GPU contact generation will be respected, if the user raises the PxConvexFlag::eCOMPUTE_CONVEX flag. Note that this may generate suboptimal convex hulls, in which case building without PxCookingParams::buildGPUData raised might make more sense, but contact generation will be performed on the CPU in this case.
Triangle meshes require PxCookingParams::buildGPUData to be set to true to build data required to process the mesh on the GPU. If this flag is not set during cooking, the GPU data for the mesh will be absent and any contact generation involving this mesh will be processed on CPU.
Any pairs requesting contact modification will be processed on the CPU.
PxSceneFlag::eENABLE_PCM must be enabled for GPU contact generation to be performed. This is the only form of contact generation implemented on the GPU. If eENABLE_PCM is not raised, contact generation will be processed on CPU for all pairs using the non distance-based legacy contact generation.

Irrespective of whether contact generation for a given pair is processed on CPU or GPU, the GPU solver will process all pairs with contacts that request collision response in their filter shader.

The GPU rigid body solver provides full support for joints and contacts. However, best performance is achieved using D6 joints because D6 joints are natively supported on the GPU, i.e., the full solver pipeline from prep to solve is implemented on the GPU. Other joint types are supported by the GPU solver but their joint shaders are run on the CPU. This will incur some additional host-side performance overhead compared to D6 joints.

Tuning

Unlike CPU PhysX, the GPU rigid bodies feature is not able to dynamically grow all buffers. Therefore, it is necessary to provide some fixed buffer sizes for the GPU rigid body feature. If insufficient memory is available, the system will issue warnings and discard contacts/constraints/pairs, which means that behavior may be adversely affected. The buffer sizes that can be adjusted are contained in PxSceneDesc::gpuDynamicsConfig. Corresponding default values are generally sufficient for scenes simulating approximately 10’000 rigid bodies. Please refer to the API documentation for the exact meaning of the members of PxgDynamicsMemoryConfig.

Performance Considerations

GPU rigid bodies can provide extremely large performance advantages over CPU rigid bodies in scenes with several thousand active rigid bodies. However, there are some performance considerations to be taken into account.

D6 joints will provide best performance when used with GPU rigid bodies. Other joint types will be partially GPU-accelerated but the performance advantages will be less than the performance advantage exhibited by D6 joints.
Convex hulls with more than 64 vertices or polygons or with more than 32 vertices per-face will have their contacts processed by the CPU rather than the GPU, so, if possible, keep vertex counts within these limits. Vertex limits can be defined in cooking to ensure that cooked convex hulls do not exceed these limits.
If your application makes heavy use of contact modification, this may limit the number of pairs that have contact generation performed on the GPU.
Modifying the state of actors forces data to be re-synced to the GPU, e.g. transforms for actors must be updated if the application adjusts global pose, velocities must be updated if the application modifies the bodies’ velocities etc.. The associated cost of re-syncing data to the GPU is relatively low but it should be taken into consideration.
Features such as joint projection, CCD and triggers are not GPU accelerated and are still processed on the CPU.