GPU Simulation

Introduction

PhysX simulation can be configured to take advantage of CUDA capable GPUs under Linux or Windows. This provides a performance benefit proportional to the arithmetic complexity of a scene. It supports the entire rigid body feature-set including articulations, FEM-based soft bodies, and particle systems. The state of GPU-accelerated simulation can be modified and queried using the same API as used to modify and query CPU simulation, or the direct GPU API when it is enabled.

GPU Rigid Bodies

GPU rigid bodies were the first GPU simulated feature introduced to PhysX and thus defined the structure of PhysX’s GPU simulation which was later extended with PBD particle system and FEM deformable body actors, as well as SDF collision geometry.

GPU rigid bodies are no more difficult to use than CPU rigid bodies. GPU rigid bodies use the exact same API and same classes as CPU rigid bodies. GPU rigid body acceleration is enabled on a per-scene basis. If enabled, all rigid bodies occupying the scene will be processed by the GPU. This feature is implemented in CUDA and requires SM6.0 (Pascal) or later compatible GPU. If no compatible device is found, simulation will fall back onto the CPU and corresponding error messages will be provided. GPU rigid bodies can easily be used in conjunction with character controllers (CCTs) and vehicles.

The GPU acceleration feature is split into two components: rigid body dynamics and broad phase. These are enabled using PxSceneFlag::eENABLE_GPU_DYNAMICS and by setting PxSceneDesc::broadPhaseType to PxBroadPhaseType::eGPU, respectively. These properties are immutable properties of the scene. In addition, you must initialize the CUDA context manager and set it on the PxSceneDesc. A snippet demonstrating how to enable GPU rigid body simulation is provided in SnippetHelloGRB. The code example below serves as a brief reference:

PxCudaContextManagerDesc cudaContextManagerDesc;

gCudaContextManager = PxCreateCudaContextManager(*gFoundation, cudaContextManagerDesc, PxGetProfilerCallback());

PxSceneDesc sceneDesc(gPhysics->getTolerancesScale());
sceneDesc.gravity = PxVec3(0.0f, -9.81f, 0.0f);
gDispatcher = PxDefaultCpuDispatcherCreate(4);
sceneDesc.cpuDispatcher = gDispatcher;
sceneDesc.filterShader  = PxDefaultSimulationFilterShader;
sceneDesc.cudaContextManager = gCudaContextManager;

sceneDesc.flags |= PxSceneFlag::eENABLE_GPU_DYNAMICS;
sceneDesc.broadPhaseType = PxBroadPhaseType::eGPU;

gScene = gPhysics->createScene(sceneDesc);

Enabling GPU rigid body dynamics turns on GPU-accelerated contact generation, shape/body management and the GPU-accelerated constraint solver. This accelerates the majority of the discrete rigid body pipeline.

Turning on GPU broad phase replaces the CPU broad phase with a GPU-accelerated broad phase.

Each can be enabled independently so, for example, you may enable GPU broad phase with CPU rigid body dynamics, CPU broad phase (SAP, MBP or ABP) with GPU rigid body dynamics or combine GPU broad phase with GPU rigid body dynamics.

PDB Particle Systems

The PhysX Position-based Dynamics (PBD) particle system is a robust simulation framework designed to handle a variety of dynamic interactions using particles. Unlike traditional particle solvers that prioritize velocity calculations, PBD focuses on determining particle positions that satisfy specific constraints, such as collision detection and volume preservation, before computing velocities. This method enhances stability and allows for larger simulation time-steps.

A PBD particle system in PhysX requires a CUDA-capable GPU and a CUDA context manager for its creation. The system can simulate diverse dynamics, including fluids, deformable objects, cloth, and inflatable structures. This flexibility allows for complex simulations involving multiple materials and interaction types. The PhysX documentation provides various snippets demonstrating the setup and configuration of PBD particle systems, showcasing its versatility in simulating realistic physical behaviors in virtual environments.

FEM Deformable Bodies

PhysX provides deformable body features for simulating materials using the Finite Element Method (FEM). Volume deformables are used to represent volumetric materials such as rubber or biological tissues, while surface deformables are suited for thin materials like cloth or shells.

PhysX’s deformable body simulation requires GPU acceleration, allowing for high-performance and scalable simulations.

Users can configure various material properties, including Young’s Modulus, which defines stiffness, and dynamic friction, which governs how a deformable body interacts with other objects.

These features provide a powerful tool for creating realistic and interactive simulations of deformable materials in various applications, from robotics and scientific visualization to VFX and gaming.

SDF Mesh Geometry

Signed Distance Field (SDF) in PhysX is a powerful tool enhancing rigid bodies collision detection. SDFs represent the distance from any point in space to the nearest surface of a 3D object, with the sign indicating whether the point is inside or outside the object. This method allows for highly accurate and efficient collision detection, especially for complex and concave shapes.

In PhysX, SDFs are used to create detailed and precise collision boundaries without the need for convex decomposition-based collision. This results in faster computations and more realistic physical interactions. The SDF feature is particularly beneficial in scenarios where high precision is required, such as in simulations involving intricate geometries or when simulating deformable body interactions with rigid bodies.

The implementation of SDFs in PhysX leverages the GPU for rapid calculations, ensuring that the performance impact is minimal even in real-time applications. This makes it suitable for use in gaming, virtual reality, and other interactive simulations where both accuracy and performance are critical.

Direct GPU API

PhysX’s SDK provides the Direct GPU API that allows direct access to GPU data of a PhysX scene, including simulation state like actors’ poses and velocities, as well as control inputs like applied forces and torques. This API enables batched direct access to GPU data for rigid dynamic objects, articulations, and shapes, facilitating efficient integration with GPU-based applications such as end-to-end GPU reinforcement learning pipelines.

To use the Direct GPU API, the PxSceneFlag::eENABLE_DIRECT_GPU_API flag must be raised along with PxSceneFlag::eENABLE_GPU_DYNAMICS and PxBroadPhaseType::eGPU broad phase type during scene creation. The PxDirectGPUAPI instance can then be retrieved from the scene, and its member functions called to read from or write to GPU buffers, specifying the actor indices and data type to access.

The deformable body and particle system features are not covered by the Direct GPU API. Instead the corresponding APIs offer functionality to directly access the state of the simulated objects.

What is GPU accelerated?

The GPU simulation feature provides GPU-accelerated implementations of:

Broad Phase
Contact generation
Shape and body management
Constraint solver

All other features are performed on the CPU.

There are several caveats to GPU contact generation. These are as follows:

Convex hulls require PxCookingParams::buildGPUData to be set to true to build data required to perform contact generation on the GPU. The cooking function will ensure that the limits of 64 vertices and polygons for GPU contact generation will be respected, if the user raises the PxConvexFlag::eCOMPUTE_CONVEX flag. Note that this may generate suboptimal convex hulls, in which case building without PxCookingParams::buildGPUData raised might make more sense, but contact generation will be performed on the CPU in this case.
Triangle meshes require PxCookingParams::buildGPUData to be set to true to build data required to process the mesh on the GPU. If this flag is not set during cooking, the GPU data for the mesh will be absent and any contact generation involving this mesh will be processed on CPU.
Any pairs requesting contact modification will be processed on the CPU.
PxSceneFlag::eENABLE_PCM must be enabled for GPU contact generation to be performed. This is the only form of contact generation implemented on the GPU. If eENABLE_PCM is not raised, contact generation will be processed on CPU for all pairs using the non distance-based legacy contact generation.

Irrespective of whether contact generation for a given pair is processed on CPU or GPU, the GPU solver will process all pairs with contacts that request collision response in their filter shader.

The GPU solver provides full support for joints and contacts. However, best performance is achieved using D6 joints because D6 joints are natively supported on the GPU, i.e., the full solver pipeline from prep to solve is implemented on the GPU. Other joint types are supported by the GPU solver but their joint shaders are run on the CPU. This will incur some additional host-side performance overhead compared to D6 joints.

Performance Considerations

GPU simulation can provide extremely large performance advantages over CPU simulation in scenes with several thousand active actors. However, there are some performance considerations to be taken into account.

D6 joints will provide best performance when used with GPU rigid bodies. Other joint types will be partially GPU-accelerated but the performance advantages will be less than the performance advantage exhibited by D6 joints.
Convex hulls with more than 64 vertices or polygons or with more than 32 vertices per-face will have their contacts processed by the CPU rather than the GPU, so, if possible, keep vertex counts within these limits. Vertex limits can be defined in cooking to ensure that cooked convex hulls do not exceed these limits.
If your application makes heavy use of contact modification, this may limit the number of pairs that have contact generation performed on the GPU.
Modifying the state of actors forces data to be re-synced to the GPU, e.g. transforms for actors must be updated if the application adjusts global pose, velocities must be updated if the application modifies the bodies’ velocities etc.. The associated cost of re-syncing data to the GPU is relatively low but it should be taken into consideration.
Features such as joint projection, CCD and triggers are not GPU accelerated and are still processed on the CPU.

GPU Memory

Unlike CPU PhysX, the GPU simulation is not able to dynamically grow all buffers. Therefore, it is necessary to provide pre-allocated buffer sizes for the GPU simulation, which are specified at scene creation through PxSceneDesc::gpuDynamicsConfig. Please refer to the API documentation for the exact meaning of the members of PxGpuDynamicsMemoryConfig.

Corresponding default values are generally sufficient for scenes simulating approximately 10’000 rigid bodies. If insufficient memory is available, the system will issue warnings and discard overflowing objects like contacts/constraints/pairs, which means that behavior may be adversely affected. For tuning the pre-allocated sizes of subsequent runs, the simulation statistics (see PxSimulationStatistics and Section Simulation Statistics) contain a PxGpuDynamicsMemoryConfigStatistics which report the actual required values of the parameters in PxGpuDynamicsMemoryConfig for a simulation with GPU dynamics/broadphase.

Running Out Of GPU Memory

GPU memory is a limited resource. Compared to traditional system memory, there is no transparent mechanism to swap rarely used memory regions to disk to avoid allocation failures and crashes. Therefore, it is possible that memory allocations can fail inside the GPU simulation pipeline because of insufficient memory. In case of a failed allocation, PhysX will stop all GPU work submission and mark the current scene as corrupted to avoid thrashing the CUDA context and crashing the GPU. As a consequence of this, PhysX will remain usable but all scenes that have been marked corrupted have to be recreated to continue simulation.

In case of an out-of-memory event, the following will happen:

The GPU memory allocator will report an error with code PxErrorCode::eOUT_OF_MEMORY.
The PxCudaContext in use will be transitioned to abort mode, and skip any future work submission.
PxCudaContext::getLastError() will return CUDA_ERROR_OUT_OF_MEMORY.
A lot of GPU-related errors will be reported due to the fact that GPU work will be skipped.
The next call to PxScene::fetchResults() or PxScene::fetchCollision() will return an error with code PxErrorCode::eABORT.
The scene that was simulated will be marked as corrupt, and any attempt to simulate further will return an error with code PxErrorCode::eINTERNAL_ERROR.
There is no guarantee that any data of a corrupt scene obtained through the API will be correct.

Querying whether the PxCudaContext is currently in abort mode can be done by calling PxCudaContext::isInAbortMode(). This can be used instead of parsing the errors, or when the PhysX CUDA context manager is used outside of the core SDK for other GPU work. In case of an allocation failure outside of the SDK, the PhysX GPU allocator will not transition the PxCudaContext to abort mode, but the user can do so themselves by calling PxCudaContext::setAbortMode(true).

To reenable simulation and recover the PxCudaContext from an out-of-memory scenario, the PxCudaContext can be reset by calling PxCudaContext::setAbortMode(false). Any scene that was marked corrupt needs to be recreated, and the user needs to verify that sufficient GPU memory is available.

GPU Simulation Compatibility

This section outlines the compatibility of CPU-based PhysX features with GPU simulation, highlighting potential incompatibilities and performance considerations.

Custom Geometry is processed exclusively on the CPU and does not interact with GPU-only features such as deformable bodies and particles. Interacting with SDFs, custom geometry falls back to TriangleMesh collision, which can lead to suboptimal collision quality and performance.
Collision data generated by custom geometry and contact modification is not available through the Direct GPU API, limiting its use in GPU-accelerated simulations.
Features like contact modification, which are used to implement conveyor belts, rely on the CPU narrow phase. These features do not support collisions with GPU-only entities like deformable bodies and particles, potentially limiting their use in GPU-accelerated simulations.
While the GPU rigid body solver supports various joint types, only the D6 joint has a full GPU implementation, making it the most efficient for GPU simulations. Other joint types are partially GPU-accelerated but involve CPU processing, which can reduce performance.
Scene queries are executed on the CPU and do not interact with GPU-only features such as deformable bodies. This limitation means that any collision queries involving GPU-accelerated entities must be handled separately or through CPU fallback mechanisms.
Both the deformable body and particle system features are GPU-only and thus will not interact with CPU-only features like custom geometry, contact modification or scene queries.