Simulation
The Simulation Loop
Use the method PxScene::simulate()
to advance the world forward in time.
Here is simplified code snippet for a fixed time stepper:
mAccumulator = 0.0f;
mStepSize = 1.0f / 60.0f;
virtual bool advance(PxReal dt)
{
mAccumulator += dt;
if(mAccumulator < mStepSize)
return false;
mAccumulator -= mStepSize;
mScene->simulate(mStepSize);
return true;
}
This code can be called whenever the app is done with processing events and is starting to idle.
It accumulates elapsed real time until it is greater than a sixtieth of a second, and then calls PxScene::simulate()
, which moves all objects in the scene forward by that interval.
This is probably the simplest of very many different ways to deal with time when stepping the simulation forward.
To allow the simulation to finish and return the results, simply call:
mScene->fetchResults(true);
True indicates that the simulation should block until it is finished, so that on return the results are guaranteed to be available.
When PxScene::fetchResults()
completes, any simulation event callback functions that you defined will also be called.
See the chapter Callback Sequence.
Until PxScene::fetchResults()
returns, the results of the current simulation step are not available.
It is not allowed to add, remove or modify scene objects while the simulation is running.
See the chapter Threading for more details about reading and writing while the simulation is running.
For the human eye to perceive animated motion as smooth, use at least twenty discrete frames per second, with each frame corresponding to a physics time step. To have smooth, realistic simulation of more complex physical scenes, use at least fifty frames per second.
Note
If you are making a real-time interactive simulation, you may be tempted to take different sized time steps which correspond to the amount of real time that has elapsed since the last simulation frame. Be very careful if you do this, rather than taking constant-sized time steps: The simulation code is sensitive to both very small and large time steps, and also to too much variation between time steps. In these cases it will likely produce jittery simulation.
See Simulation memory for details on how memory is used in simulation.
Island Management
For performance reasons the simulation of scenes with multiple actors is split up into multiple islands, which are solved independently. Each actor is assigned to exactly one island and during the solve procedure actors can only influence other actors within the same island.
Islands are created by finding disconnected subgraphs in a scene graph where actors are nodes and edges represent connections between actors.
The edges of the graph may be interactions between actors, for example contacts, but also explicit constraints like attachments and joints.
Most of these edges are created automatically by PhysX when contacts occur or joints are being added to the scene.
In cases where users can directly write constraints into buffers, for example PxParticleSystem::addParticleBuffer()
, the SDK does not inspect the content of these constraint buffers until the constraint solver is invoked.
Therefore, it is necessary that users manually take care of creating edges (e.g., through PxParticleSystem::addRigidAttachment()
) such that the islands can be properly formed and interacting actors end up in the same island.
Also note that for performance reasons, independent small islands may be fused into a bigger island.
This is particularly the case for GPU simulation, where the entire scene is merged into a single island.
Each actor in PhysX has a method for defining solver iteration counts, see e.g., Solver Iterations. Given those per-actor iteration counts, each solver island performs as many iterations as the actor with the highest iteration count requests. Therefore, the actual number of solver iterations each body and constraint undergoes can change over time as the assignment of actors to islands changes and actors are going to sleep. If these varying iteration counts are not desired, all actors in the scene should be configured to require the same number of iterations.
Callback Sequence
PhysX callbacks allow any application to listen for events and react as required.
The following callbacks are executed:
onConstraintBreak
onWake
onSleep
onContact
onTrigger
onAdvance
To listen to any of these events it is necessary to subclass PxSimulationEventCallback
so that the various virtual functions may be implemented as desired.
An instance of this subclass can then be registered per scene with either PxScene::setSimulationEventCallback()
or PxSceneDesc::simulationEventCallback
.
Following these steps alone will ensure that constraint break events are successfully reported.
One more step is required to report sleep and wake events: to avoid the expense of reporting all sleep and wake events, actors identified as worthy of sleep/wake notification require the flag PxActorFlag::eSEND_SLEEP_NOTIFIES
to be raised.
Finally, to receive onContact and onTrigger events it is necessary to set a flag in the filter shader callback for all pairs of interacting objects for which events are required.
More details on the filter shader callback can be found in Collision Filtering.
Each callback allows read operations to be performed on the relevant actors involved in each event.
It is important to note that for all events except PxSimulationEventCallback::onAdvance()
, these read operations will return the state of the actors at the end of the simulation step rather than the state the actors had when the event was first detected during the course of the simulation step.
This particularly affects the callbacks PxSimulationEventCallback::onTrigger()
, PxSimulationEventCallback::onContact()
and PxSimulationEventCallback::onConstraintBreak()
.
The linear velocity, angular velocity and pose used to detect PxSimulationEventCallback::onContact()
events can be retrieved by amending the simulation filter shader with the flags PxPairFlag::ePRE_SOLVER_VELOCITY
and PxPairFlag::eCONTACT_EVENT_POSE
.
This leads to code as follows:
virtual void onContact(const PxContactPairHeader& pairHeader, const PxContactPair* pairs, PxU32 nbPairs)
{
// Retrieve the current poses and velocities of the two actors involved in the contact event.
{
const PxTransform body0PoseAtEndOfSimulateStep = pairHeader.actors[0]->getGlobalPose();
const PxTransform body1PoseAtEndOfSimulateStep = pairHeader.actors[1]->getGlobalPose();
const PxVec3 body0LinVelAtEndOfSimulateStep = pairHeader.actors[0]->is<PxRigidDynamic>() ? pairHeader.actors[0]->is<PxRigidDynamic>()->getLinearVelocity() : PxVec3(PxZero);
const PxVec3 body1LinVelAtEndOfSimulateStep = pairHeader.actors[0]->is<PxRigidDynamic>() ? pairHeader.actors[1]->is<PxRigidDynamic>()->getLinearVelocity() : PxVec3(PxZero);
const PxVec3 body0AngVelAtEndOfSimulateStep = pairHeader.actors[0]->is<PxRigidDynamic>() ? pairHeader.actors[0]->is<PxRigidDynamic>()->getAngularVelocity() : PxVec3(PxZero);
const PxVec3 body1AngVelAtEndOfSimulateStep = pairHeader.actors[0]->is<PxRigidDynamic>() ? pairHeader.actors[1]->is<PxRigidDynamic>()->getAngularVelocity() : PxVec3(PxZero);
}
// Retrieve the poses and velocities of the two actors involved in the contact event as they were
// when the contact event was detected.
PxContactPairExtraDataIterator iter(pairHeader.extraDataStream, pairHeader.extraDataStreamSize);
while(iter.nextItemSet())
{
const PxTransform body0PoseAtContactEvent = iter.eventPose->globalPose[0];
const PxTransform body1PoseAtContactEvent = iter.eventPose->globalPose[1];
const PxVec3 body0LinearVelocityAtContactEvent = iter.preSolverVelocity->linearVelocity[0]
const PxVec3 body1LinearVelocityAtContactEvent = iter.preSolverVelocity->linearVelocity[1];
const PxVec3 body0AngularVelocityAtContactEvent = iter.preSolverVelocity->angularVelocity[0]
const PxVec3 body1AngularVelocityAtContactEvent = iter.preSolverVelocity->angularVelocity[1];
}
}
The PxSimulationEventCallback::onAdvance()
callback provides early access to the new pose of moving rigid bodies.
When this call occurs, rigid bodies that have the flag PxRigidBodyFlag::eENABLE_POSE_INTEGRATION_PREVIEW
raised were moved by the simulation and their new poses can be accessed using the provided buffers.
This callback is different from the others mentioned above in the sense that it will get called while the simulation is running.
As a consequence, code in this callback should be as lightweight as possible, as it will block the simulation.
It is forbidden to perform write operations in any callback.
Simulation memory
PhysX relies on the application for all memory allocation. The primary interface is via the PxAllocatorCallback
interface required to initialize the SDK:
class PxAllocatorCallback
{
public:
virtual ~PxAllocatorCallback() {}
virtual void* allocate(size_t size, const char* typeName, const char* filename,
int line) = 0;
virtual void deallocate(void* ptr) = 0;
};
After the self-explanatory function argument describing the size of the allocation, the next three function arguments are an identifier name, which identifies the type of allocation, and the __FILE__
and __LINE__
location inside the SDK code where the allocation was made.
More details of these function arguments can be found in the API documentation: PxAllocatorCallback
.
Note
An important change since 2.x: The SDK now requires that the memory that is returned to be 16-byte aligned.
On many platforms malloc()
returns memory that is 16-byte aligned, but on Windows the system function _aligned_malloc()
provides this capability.
Note
On some platforms PhysX uses system library calls to determine the correct type name, and the system function that returns the type name may call the system memory allocator.
If you are instrumenting system memory allocations, you may observe this behavior.
To prevent PhysX requesting type names, disable allocation names using the method PxFoundation::setReportAllocationNames()
.
Minimizing dynamic allocation is an important aspect of performance tuning. PhysX provides several mechanisms to control and analyze memory usage. These shall be discussed in turn.
Scene Limits
The number of allocations for tracking objects can be minimized by presizing the capacities of scene data structures, using either PxSceneDesc::limits
before creating the scene or the function PxScene::setLimits()
.
It is useful to note that these limits do not represent hard limits, meaning that PhysX will automatically perform further allocations if the number of objects exceeds the scene limits.
16K Data Blocks
Much of the memory PhysX uses for simulation is held in a pool of blocks, each 16K in size.
The initial number of blocks allocated to the pool can be controlled by setting PxSceneDesc::nbContactDataBlocks
, while the maximum number of blocks that can ever be in the pool is governed by PxSceneDesc::maxNbContactDataBlocks
.
If PhysX internally needs more blocks than nbContactDataBlocks
then it will automatically allocate further blocks to the pool until the number of blocks reaches maxNbContactDataBlocks
.
If PhysX subsequently needs more blocks than the maximum number of blocks, it will simply start dropping contacts and joint constraints.
When this happens, warnings are passed to the error stream in the PX_CHECKED
configuration.
To help tune nbContactDataBlocks
and maxNbContactDataBlocks
it can be useful to query the number of blocks currently allocated to the pool using the function PxScene::getNbContactDataBlocksUsed()
.
It can also be useful to query the maximum number of blocks that can ever be allocated to the pool with PxScene::getMaxNbContactDataBlocksUsed()
.
Unused blocks can be reclaimed using PxScene::flushSimulation()
.
When this function is called any allocated blocks not required by the current scene state will be deleted so that they may be reused by the application.
Additionally, a number of other memory resources are freed by shrinking them to the minimum size required by the scene configuration.
Scratch Buffer
A scratch memory block may be passed as a function argument to the function PxScene::simulate()
.
As far as possible, PhysX will internally allocate temporary buffers from the scratch memory block, thereby reducing the need to perform temporary allocations from PxAllocatorCallback
.
The block may be reused by the application after the PxScene::fetchResults()
call, which marks the end of the simulation.
One restriction on the scratch memory block is that its size must be a multiple of 16K, and it must be 16-byte aligned.
In-Place Serialization
PhysX objects can be stored in memory owned by the application using PhysX’ binary deserialization mechanism. See Serialization for details.
GPU Memory
When simulating a GPU-accelerated scene (see GPU Simulation), PhysX will allocate GPU device memory and pinned host memory. Contrary to CPU-side host memory, these allocations are made though the appropriate CUDA and GPU driver APIs and do not use the application-provided allocator. For details on GPU memory management, refer to the Section GPU Memory.
Completion Tasks
A completion task is a Task that executes once the chain of simulation tasks triggered during PxScene::simulate()
has finished.
If PhysX has been configured to use worker threads then PxScene::simulate()
will start simulation tasks on the worker threads and will likely exit before the worker threads have completed the work necessary to complete the scene update.
A typical completion task would first need to call PxScene::fetchResults(true)
to wrap up the simulation update step.
After calling PxScene::fetchResults(true)
, the completion task can perform any other post-physics work deemed necessary by the application:
scene.fetchResults(true);
game.updateA();
game.updateB();
...
game.updateZ();
The completion task is specified as a function argument in PxScene::simulate()
.
Synchronizing with Other Threads
An important consideration for substepping is that PxScene::simulate()
and PxScene::fetchResults()
are considered write calls on the scene, and it is therefore illegal to read from or write to a scene while those functions are running.
Note
PhysX does not lock its scene graph, but it will report an error in checked build if it detects that multiple threads make concurrent calls to the same scene, unless they are all read calls.
Substepping
For reasons of fidelity simulation or better stability it is often desired that the simulation frequency of PhysX be higher than the update rate of the application.
The simplest way to do this is just to call PxScene::simulate()
and PxScene::fetchResults()
multiple times:
for(PxU32 i=0; i<substepCount; i++)
{
... pre-simulation work (update controllers, etc) ...
scene->simulate(substepSize);
scene->fetchResults(true);
... post simulation work (process physics events, etc) ...
}
Sub-stepping can also be integrated with the completion task feature of the simulate()
function. To illustrate this, consider the situation where the scene is simulated until the graphics component signals that it has completed updating the render state of the scene.
Here, the completion task will run once the simulation tasks have finished and its first job will be to block with fetchResults(true)
to complete the simulation step.
When the completion task is able to proceed, its next work item will be to query the graphics component to check if another simulate()
is required or if it can exit.
In the case that another simulate()
step is required it will clearly need to pass a completion task to simulate()
.
A tricky point here is that a completion task cannot submit itself as the next completion task because it would cause an illegal recursion.
A solution to this problem might to be to have two completion tasks where each stores a reference to the other.
Each completion task can then pass its partner to simulate()
:
scene.fetchResults(true);
if(!graphics.isComplete())
{
scene.simulate(otherCompletionTask);
}
Split sim
As an alternative to simulate()
/ fetchResults()
, a simulation step may be split into two phases: PxScene::collide()
and PxScene::advance()
.
This is known as split simulation.
The key point here is that the simulate()
/ fetchResults()
combination permits reads and writes only before simulate()
and after fetchResults()
.
The split simulation, on the other hand, relaxes this restriction and allows some reads and writes to take place at specific points during the course of a simulation step.
This shall now be explained in more detail.
When using split sim, a physics simulation step would look like this:
scene.collide(dt)
scene.fetchCollision()
scene.advance()
scene.fetchResults()
As already mentioned, split sim allows some properties to be written during the simulation step.
More specifically, some properties, known as write-through properties, may be modified in-between the return from PxScene::fetchCollision()
and the execution of the advance()
call.
This allows collide()
to begin before the data required by advance()
is available and to run in parallel with application-side logic that generates inputs to advance()
.
This is particularly useful for animation logic generating kinematic targets, and for controllers applying forces to bodies.
The write-through properties are listed below:
PxRigidDynamic/PxArticulationLink::addForce()/addTorque()/clearForce()/clearTorque()/setForceAndTorque()
PxRigidDynamic/PxArticulationLink::setAngularVelocity()/setLinearVelocity()
PxRigidDynamic/PxArticulation::wakeUp()
PxRigidDynamic/PxArticulation::setWakeCounter()
PxRigidDynamic::setKinematicTarget()
Split sim also allows API read commands to be called during collide()
and in-between the return from fetchCollision()
and the execution of the advance()
call.
These read commands are listed below:
PxRigidActor/PxArticulationLink::getGlobalPose()
PxRigidActor/PxArticulation/PxArticulationLink::getWorldBounds()
PxConstraint::getForce()
PxRigidActor/PxArticulationLink::getLinearVelocity()/getAngularVelocity()
Users can interleave the physics-dependent application logic between collide()
and advance()
as follows:
scene.collide(dt)
read poses, velocities, world bounds and constraint forces
physics-dependent game logic (anmimation, rendering) generating a set of modifications to apply to write-through properties before the advance() phase
scene.fetchCollision()
read more poses, velocities, world bounds and constraint forces
apply user-buffered modifications to the write-through properties
scene.advance()
scene.fetchResults()
The function fetchCollision()
will wait until collide()
has finished.
Once fetchCollision()
has completed, user-buffered modifications to write-through properties can be applied to the objects in the executing scene.
In the subsequent advance()
phase, the solver will take the modified write-through properties into account when computing the new sets of velocities and poses for the actors being simulated.
As a final comment, it is worth noting that illegal read and write calls are detected in all build configurations - an illegal call will immediately return with an error passed to PxErrorCallback
and the function will not be executed.
Split fetchResults
The fetchResults()
method is available in both a standard and split format.
The split format offers some advantages over the standard fetchResults()
method because it permits the user to parallelize processing of contact reports, which can be expensive when simulating complex scenes.
A simplistic way to use split fetchResults()
would look something like this:
gSharedIndex = 0;
gScene->simulate(1.0f / 60.0f);
//Call fetchResultsStart. Get the set of pair headers
const PxContactPairHeader* pairHeader;
PxU32 nbContactPairs;
gScene->fetchResultsStart(pairHeader, nbContactPairs, true);
//Set up continuation task to be run after callbacks have been processed in parallel
callbackFinishTask.setContinuation(*gScene->getTaskManager(), NULL);
callbackFinishTask.reset();
//process the callbacks
gScene->processCallbacks(&callbackFinishTask);
callbackFinishTask.removeReference();
callbackFinishTask.wait();
gScene->fetchResultsFinish();
The user is free to use their own task/threading system to process the callbacks. However, the PhysX scene provides a utility function that processes the callbacks using multiple threads, which is used in this code snippet. This method takes a continuation task that will be run when the tasks processing callbacks have completed. In this example, the completion task raises an event that can be waited upon to notify the main thread that callback processing has completed.
This feature is demonstrated in SnippetSplitFetchResults. In order to make use of this approach, contact notification callbacks must be thread-safe. Furthermore, for this approach to be beneficial, contact notification callbacks need to be doing a significant amount of work to benefit from multi-threading them.
Shifting The Scene Origin
Problems arising from the limits of floating point precision become more pronounced as objects move further from the origin. This phenomenon adversely affects large world scenarios. One solution might be to teleport all objects towards the origin with the proviso that their relative positions are preserved. The problem here is that internally cached data and persistent state will become invalid. PhysX offers an API to shift the origin of an entire scene in a way that maintains the consistency of the internally cached data and persistent state.
The function shiftOrigin()
will shift the origin of a scene by a translation vector:
PxScene::shiftOrigin(const PxVec3& shift)
The positions of all objects in the scene and the corresponding data structures will be adjusted to reflect the new origin location (basically, the shift vector will be subtracted from all object positions). The intended use pattern for this API is to shift the origin such that object positions move closer towards zero. Please note that it is the user’s responsibility to keep track of the summed total origin shift and adjust all input/output to/from PhysX accordingly. It is worth noting that this can be an expensive operation and it is recommended to use it only in the case where distance-related precision issues arise in areas far from the origin. If extension modules of PhysX, such as the character or vehicle controller, are used then it will be necessary to propagate the scene shift to those modules as well. Please refer to the API documentation of these modules for details.
Solver Residual
The solver residual quantifies the convergence of the iterative physics solver. A perfectly converged solution has a residual value of zero. Each constraint contributes to the accumulated residual value. This accumulation captures the maximum value among all residuals and the root mean squared (RMS) value. Once the feature is activated by setting the ENABLE_SOLVER_RESIDUAL_REPORTING flag on the scene, residuals can be obtained from the following PhysX objects:
On the physics scene (PxScene): Residual across all solver error sources, including contacts.
On articulation roots (PxArticulationJointReducedCoordinate): Residual across all joints that are part of that articulation.
On non-articulation joints (PxConstraint via getConstraint() on PxJoint): The joint residual
The following code demonstrates how to retrieve the residual values. Remember that the returned values are meaningless if the ENABLE_SOLVER_RESIDUAL_REPORTING flag is not set during scene construction:
PxSceneResidual sceneResidual = scene->getSolverResidual();
PxArticulationResidual articulationResidual = articulation->getSolverResidual();
PxConstraintResidual jointResidual = joint->getConstraint()->getSolverResidual();
The reported residual value encompasses all residual sources associated with the queried object. Residual values are reported for the last position and velocity iteration. Higher iteration counts typically result in lower residual values because the solver can execute more correction steps while seeking the best solution. When the TGS solver is active, the residuals tend to fluctuate more, compared to the PGS solver with identical iteration counts. This discrepancy arises from the TGS solver’s nature, which internally divides a full timestep into N substeps and approximately solves them, while the PGS solver allocates all its N iterations to solving the full timestep. In many scenarios, the convergence behavior of TGS is superior to PGS at identical iteration counts, despite the residual values suggesting otherwise, because the substepping strategy optimizes the use of every iteration, whereas later PGS iterations can only apply minor corrections. An example would be a pile of small rigid bodies stored in a jar. TGS almost certainly will result in less object overlap compared to PGS, even if the solver residuals report higher errors for the TGS solver. The takeaway is that TGS and PGS residual values are not directly comparable despite being based on the same metric.