PxDirectGPUAPI

class PxDirectGPUAPI

PxDirectGPUAPI exposes an API that enables batched direct access to GPU data for a PxScene.

The functions in this class allow batched direct access to GPU data for PxRigidDynamic, PxArticulationReducedCoordinate and PxShape types. This allows interoperation with GPU post- and preprocessing for users and allows the user to implement more efficient CPU-GPU data copies based on the specific needs of the application.

Using this direct-API will disable the existing CPU-based API for all the data exposed in the direct-API. For any API function that does not have a counterpart in this direct-API, the existing API will continue to work.

To use this API, PxSceneFlag::eENABLE_DIRECT_GPU_API needs to be raised, in combination with PxSceneFlag::eENABLE_GPU_DYNAMICS and PxBroadphaseType::eGPU. Note that these options are immutable and cannot be changed after the scene has been created.

Due to the internal architecture of the GPU-accelerated parts of PhysX, using this API comes with caveats:

1) All GPU-CPU copies for data exposed in this API will be disabled. This means that the existing CPU-based API will return outdated data, and any setters for data exposed in the interface will not work. On the other hand, significant speedups can be achieved because of the reduced amount of GPU-CPU memory copies.

2) Due to the internal architecture of the GPU-accelerated PhysX, this API will only work after a first simulation step has been taken. The reason for this is that the PxScene first needs to know all the actors it will have to simulate, and setup the sizes of the GPU-based structures. For setup, the existing CPU API should be used.

Note

Due to the fact that this API is exposing low-level data, we do reserve the right to change this API without deprecation in case of changes in the internal implementations.

Public Functions

virtual bool getRigidDynamicData(void *data, const PxRigidDynamicGPUIndex *gpuIndices, PxRigidDynamicGPUAPIReadType::Enum dataType, PxU32 nbElements, CUevent startEvent = NULL, CUevent finishEvent = NULL) const = 0

Copies the simulation state for a set of PxRigidDynamic actors into a user-provided GPU data buffer.

See also

PxRigidDynamic::getGPUIndex(). The size of this buffer needs to be nbElements * sizeof(PxRigidDynamicGPUIndex). The data requested for the PxRigidDynamic with its GPU index at position x in the gpuIndices array will be located at position x in the data array.

See also

PxRigidDynamicGPUAPIReadType.

Parameters

data – [out] User-provided GPU data buffer which has size nbElements * sizeof(type). For the types, see the dataType options in PxRigidDynamicGPUAPIReadType.
gpuIndices – [in] User-provided GPU index buffer containing elements of PxRigidDynamicGPUIndex. This buffer contains the GPU indices of the PxRigidDynamic objects that are part of this get operation.
dataType – [in] The type of data to get.
nbElements – [in] The number of rigid bodies to be copied.
startEvent – [in] User-provided CUDA event that is awaited at the start of this function. Defaults to NULL which means the function will dispatch the copy immediately.
finishEvent – [in] User-provided CUDA event that is recorded at the end of this function. Defaults to NULL which means the function will wait for the copy to finish before returning.

Returns

bool Whether the operation was successful. Note that this might not include asynchronous CUDA errors.

virtual bool setRigidDynamicData(const void *data, const PxRigidDynamicGPUIndex *gpuIndices, PxRigidDynamicGPUAPIWriteType::Enum dataType, PxU32 nbElements, CUevent startEvent = NULL, CUevent finishEvent = NULL) = 0

Sets the simulation state for a set of PxRigidDynamic actors from a user-provided GPU data buffer.

See also

PxRigidDynamic::getGPUIndex(). The size of this buffer needs to be nbElements * sizeof(PxRigidDynamicGPUIndex). The data for the PxRigidDynamic with its GPU index at position x in the gpuIndices array needs to be located at position x in the data array.

See also

PxRigidDynamicGPUAPIWriteType.

Parameters

data – [in] User-provided GPU data buffer which has size nbElements * sizeof(type). For the types, see the dataType options in PxRigidDynamicGPUAPIWriteType.
gpuIndices – [in] User-provided GPU index buffer containing elements of PxRigidDynamicGPUIndex. This buffer contains the GPU indices of the PxRigidDynamic objects that are part of this set operation.
dataType – [in] The type of data to set.
nbElements – [in] The number of rigid bodies to be set.
startEvent – [in] User-provided CUDA event that is awaited at the start of this function. Defaults to NULL which means the function will dispatch the copy immediately.
finishEvent – [in] User-provided CUDA event that is recorded at the end of this function. Defaults to NULL which means the function will wait for the copy to finish before returning.

Returns

bool Whether the operation was successful. Note that this might not include asynchronous CUDA errors.

virtual bool getArticulationData(void *data, const PxArticulationGPUIndex *gpuIndices, PxArticulationGPUAPIReadType::Enum dataType, PxU32 nbElements, CUevent startEvent = NULL, CUevent finishEvent = NULL) const = 0

Gets the simulation state for a set of articulations, i.e.

PxArticulationReducedCoordinate objects and copies into a user-provided GPU data buffer.

The data buffer must be sized according to the maximum component counts across all articulations in the PxScene, as summarised in PxArticulationGPUAPIMaxCounts. The data buffer is split into sequential blocks that are of equal size and can hold the data for all components of an articulation. For example, for a link-centric data type (PxArticulationGPUAPIReadType::eLINK_GLOBAL_POSE, for example) each of these blocks has to be maxLinks * sizeof(dataType). The size of the complete buffer would then be nbElements * maxLinks * sizeof(dataType). For a dof-centric data type, the block size would be maxDofs * sizeof(dataType). The specific layout for each dataType is detailed in the API documentation of PxArticulationGPUAPIReadType. The max counts for a scene can be obtained by calling PxDirectGPUAPI::getArticulationGPUAPIMaxCounts().

See also

PxArticulationReducedCoordinate::getGPUIndex(). The size of this buffer needs to be nbElements * sizeof(PxArticulationGPUIndex). The data for the PxArticulationReducedCoordinate with its GPU index at position x in the gpuIndices array will have its data block located at position x in the data array.

See also

PxArticulationGPUAPIReadType.

The link and dof indexing of these blocks then follows the same pattern as the PxArticulationCache API. We refer to the user guide for an explanation.

Parameters

data – [out] User-provided GPU data buffer that is appropriately sized for the data being requested. The sizing is explained in detail below.
gpuIndices – [in] User-provided GPU index buffer containing elements of PxArticulationGPUIndex. This buffer contains the GPU indices of the PxArticulationReducedCoordinate objects that are part of this get operation.
dataType – [in] The type of data to get.
nbElements – [in] The number of articulations to copy data from.
startEvent – [in] User-provided CUDA event that is awaited at the start of this function. Defaults to NULL which means the function will dispatch the copy immediately.
finishEvent – [in] User-provided CUDA event that is recorded at the end of this function. Defaults to NULL which means the function will wait for the copy to finish before returning.

Returns

bool Whether the operation was successful. Note that this might not include asynchronous CUDA errors.

virtual bool setArticulationData(const void *data, const PxArticulationGPUIndex *gpuIndices, PxArticulationGPUAPIWriteType::Enum dataType, PxU32 nbElements, CUevent startEvent = NULL, CUevent finishEvent = NULL) = 0

Sets the simulation state for a set of articulations, i.e.

PxArticulationReducedCoordinate objects from a user-provided GPU data buffer.

The data buffer must be sized according to the maximum component counts across all articulations in the PxScene, as summarised in PxArticulationGPUAPIMaxCounts. The data buffer is split into sequential blocks that are of equal size and can hold the data for all components of an articulation. For example, for a link-centric data type (PxArticulationGPUAPIWriteType::eLINK_FORCE, for example) each of these blocks has to be maxLinks * sizeof(dataType). The size of the complete buffer would then be nbElements * maxLinks * sizeof(dataType). For a dof-centric data type, the block size would be maxDofs * sizeof(dataType). The specific layout for each dataType is detailed in the API documentation of PxArticulationGPUAPIWriteType. The max counts for a scene can be obtained by calling PxDirectGPUAPI::getArticulationGPUAPIMaxCounts().

See also

PxArticulationReducedCoordinate::getGPUIndex(). The size of this buffer needs to be nbElements * sizeof(PxArticulationGPUIndex). The data for the PxArticulationReducedCoordinate with its GPU index at position x in the gpuIndices array needs to have its data block located at position x in the data array.

See also

PxArticulationGPUAPIWriteType.

The internal indexing of these blocks then follows the same pattern as the PxArticulationCache API. We refer to the user guide for an explanation.

Parameters

data – [in] User-provided GPU data buffer that is appropriately sized for the data to be set. The sizing is explained in detail below.
gpuIndices – [in] User-provided GPU index buffer containing elements of PxArticulationGPUIndex. This buffer contains the GPU indices of the PxArticulationReducedCoordinate objects that are part of this set operation.
dataType – [in] The type of data to set.
nbElements – [in] The number of articulations to set data for.
startEvent – [in] User-provided CUDA event that is awaited at the start of this function. Defaults to NULL which means the function will dispatch the copy immediately.
finishEvent – [in] User-provided CUDA event that is recorded at the end of this function. Defaults to NULL which means the function will wait for the copy to finish before returning.

Returns

bool Whether the operation was successful. Note that this might not include asynchronous CUDA errors.

virtual bool computeArticulationData(void *data, const PxArticulationGPUIndex *gpuIndices, PxArticulationGPUAPIComputeType::Enum operation, PxU32 nbElements, CUevent startEvent = NULL, CUevent finishEvent = NULL) = 0

performs a compute operation on a set of articulations, i.e.

PxArticulationReducedCoordinate objects.

The appropriate sizing of the data buffer as well as the data layout is documented alongside the compute operations in the API documentation of PxArticulationGPUAPIComputeType.

See also

PxArticulationReducedCoordinate::getGPUIndex(). The size of this buffer needs to be nbElements * sizeof(PxArticulationGPUIndex).

Parameters

data – [inout] User-provided GPU data buffer that is appropriately sized for the operation to be performed. Depending on the operation, can be input or output data.
gpuIndices – [in] User-provided GPU index buffer containing elements of PxArticulationGPUIndex. This buffer contains the GPU indices of the PxArticulationReducedCoordinate objects that are part of this compute operation.
operation – [in] The operation to perform. See PxArticulationGPUAPIComputeType::Enum.
nbElements – [in] The number of articulations to perform this compute operation on.
startEvent – [in] User-provided CUDA event that is awaited at the start of this function. Defaults to NULL which means the function will dispatch the computation immediately.
finishEvent – [in] User-provided CUDA event that is recorded at the end of this function. Defaults to NULL which means the function will wait for the computation to finish before returning.

Returns

bool Whether the operation was successful. Note that this might not include asynchronous CUDA errors.

virtual bool copyContactData(void *data, PxU32 *nbContactPairs, PxU32 maxPairs, CUevent startEvent = NULL, CUevent finishEvent = NULL) const = 0

Copy rigid body (PxRigidBody) and articulation (PxArticulationReducedCoordinate) contact data to a user-provided GPU data buffer.

Note

This function only reports contact data for actor pairs where both actors are either rigid bodies or articulations.

Note

The contact data contains pointers to internal state and is only valid until the next call to simulate().

Parameters

data – [out] User-provided GPU data buffer, which should be the size of PxGpuContactPair * maxPairs
nbContactPairs – [out] User-provided GPU data buffer of 1 * sizeof(PxU32) that contains the actual number of pairs that was written.
maxPairs – [in] The maximum number of pairs that the buffer can contain.
startEvent – [in] User-provided CUDA event that is awaited at the start of this function. Defaults to NULL which means the function will dispatch the copy immediately.
finishEvent – [in] User-provided CUDA event that is recorded at the end of this function. Defaults to NULL which means the function will wait for the copy to finish before returning.

Returns

bool Whether the operation was successful. Note that this might not include asynchronous CUDA errors.

virtual bool evaluateSDFDistances(PxVec4 *localGradientAndSignedDistanceConcatenated, const PxShapeGPUIndex *shapeIndices, const PxVec4 *localSamplePointsConcatenated, const PxU32 *samplePointCountPerShape, PxU32 nbElements, PxU32 maxPointCount, CUevent startEvent = NULL, CUevent finishEvent = NULL) const = 0

Evaluate sample point distances and gradients on SDF shapes in local space.

Local space is the space in which the mesh’s raw vertex positions are represented.

Example: Ten shapes are part of the simulation. Three of them have an SDF (shapeIndices of the SDF meshes are 2, 4 and 6). For the first shape, the SDF distance of 10 sample points should be queried. 20 sample points for the second mesh and 30 sample points for the third mesh. The slice size (=maxPointCount) is the maximum of sample points required for any shape participating in the query, 30 = max(10, 20, 30) for this example. The buffers required for the method evaluateSDFDistances are constructed as follows (not including optional parameters): localGradientAndSignedDistanceConcatenated[length: 3 * 30]: No initialization needed. It will hold the result after the finishEvent occurred. It has the same structure as localSamplePointsConcatenated, see below. The format of the written PxVec4 is as follows (gradX, gradY, gradZ, sdfDistance) shapeIndices[length: 3] The content is {2, 4, 6} which are the shape indices for this example localSamplePointsConcatenated[length: 3 * 30]:

Slice 0…29 has only the first 10 elements set to local sample points (w component is unused) with respect to the coordinate frame of the first shape to be queried Slice 30…59 has only the first 20 elements set to local sample points (w component is unused) with respect to the coordinate frame of the second shape to be queried Slice 60…89 has all 30 elements set to local sample points (w component is unused) with respect to the coordinate frame of the third shape to be queried samplePointCountPerShape[length: 3] The content is {10, 20, 30} which are the number of samples to evaluate per shape used in this example. Note that the slice size (=maxPointCount) is the maximum value in this list. nbElements: 3 for this example since 3 shapes are participating in the query maxPointCount: 30 for this example since 30 is the slice size (= maxPointCount = 30 = max(10, 20, 30))

See also

PxShape::getGPUIndex(). The size of this buffer (in bytes) needs to be nbElements * sizeof(PxShapeGPUIndex). The shapes must be triangle mesh shapes with SDFs.

Parameters

localGradientAndSignedDistanceConcatenated – [out] User-provided GPU buffer where the evaluated gradients and distances in SDF local space get stored. It has the same structure as localSamplePointsConcatenated. The PxVec4 elements contain the gradient and the distance (gradX, gradY, gradZ, distance).
shapeIndices – [in] User-provided GPU index buffer containing elements of PxShapeGPUIndex. This buffer contains the GPU indices of the PxShape objects that are part of this operation.
localSamplePointsConcatenated – [in] User-provided GPU buffer containing the sample point locations for every shape in the shapes’ local space. The buffer stride is maxPointCount.
samplePointCountPerShape – [in] User-provided GPU buffer containing the number of sample points for every shape.
nbElements – [in] The number of shapes to be queried.
maxPointCount – [in] The maximum value in the array samplePointCountPerShape. Note that the arrays localGradientAndSignedDistanceConcatenated and localSamplePointsConcatenated must have the size (in bytes) nbElements * maxPointCount * sizeof(PxVec4).
startEvent – [in] User-provided CUDA event that is awaited at the start of this function. Defaults to NULL which means the function will dispatch the computation immediately.
finishEvent – [in] User-provided CUDA event that is recorded at the end of this function. Defaults to NULL which means the function will wait for the computation to finish before returning.

Returns

bool Whether the operation was successful. Note that this might not include asynchronous CUDA errors.

virtual PxArticulationGPUAPIMaxCounts getArticulationGPUAPIMaxCounts() const = 0

Get the maximal articulation index and component counts for a PxScene.

Get the maximal articulation index and component counts for a PxScene. This is a helper function to ease the derivation of the correct data layout for the articulation functions in PxDirectGPUAPI. Specifically, this function will return maxLinks, maxDofs, maxFixedTendons, maxFixedTendonJoints, maxSpatialTendons and maxSpatialTendonAttachments for a scene.

See also

PxArticulationGPUAPIMaxCounts.

See also

PxDirectGPUAPI::getArticulationData, PxDirectGPUAPI::setArticulationData, PxDirectGPUAPI::computeArticulationData

Returns: PxArticulationGPUAPIMaxCounts the max counts across the scene for all articulation indices and components.

Protected Functions

inline PxDirectGPUAPI()

inline virtual ~PxDirectGPUAPI()