cuBALS使用(5)-cublasXt
创始人
2025-05-30 01:18:33

cuBLAS的cuBLASXt API提供支持多GPU的主机接口:当使用该API时,应用程序仅需要在主机存储器空间上分配所需的矩阵。矩阵的大小没有限制,只要它们可以放入主机存储器即可。cuBLASXt API负责在指定的GPU之间分配内存,并在它们之间分派工作负载,最后将结果检索回主机。cuBLASXt API仅支持计算密集型BLAS 3例程(例如矩阵-矩阵运算),在这些例程中,来自GPU的PCI来回传输可以分摊。cuBLASXt API有自己的头文件cublasXt.h。
从8.0版开始,cuBLASXt API允许将任何矩阵放置在GPU设备上。
注意:cuBLASXt API仅在64位平台上受支持。

平铺设计方法

为了能够在多个GPU之间分担工作负载,cuBLASXt API使用了一种平铺策略:每个矩阵被分成用户可控尺寸BlockDim × BlockDim的正方形块。生成的矩阵平铺定义静态调度策略:每一所得瓦片以循环方式作用于GPU。每一GPU创建一个CPU线程,且所述CPU线程负责进行适当的存储器传送和CUBLAS操作以计算其所负责的所有瓦片。从性能的角度来看,由于这种静态调度策略,最好每个GPU的计算能力和PCI带宽都相同。下图说明了3个GPU之间的图块分布。为了从C计算第一瓦片G0,负责GPU0的CPU线程0必须以管线方式加载来自A的第一行的3个瓦片和来自B的第一列的瓦片,以便重叠存储器传送和计算,且在移动到下一瓦片G0之前将结果求和到C的第一瓦片G0中。

 当块尺寸不是C的尺寸的精确倍数时,一些块在右边界或/和底部边界上被部分填充。当前的实现不会填充不完整的图块,而是通过执行正确的简化cuBLAS操作来跟踪这些不完整的图块:这样就不会进行额外的计算。然而,当所有GPU没有相同数量的不完整瓦片工作时,它仍然可以导致一些负载不平衡。
当一个或多个矩阵位于某些GPU设备上时,应用相同的平铺方法和工作负载共享。在这种情况下,存储器传输在设备之间进行。然而,当图块的计算和一些数据位于同一GPU设备上时,绕过将本地数据传输到图块或从本地数据传输到图块的存储器传输,并且GPU直接对本地数据进行操作。这可以显著提高性能,尤其是在仅使用一个GPU进行计算时。
矩阵可以位于任何GPU设备上,并且不必位于同一GPU设备上。此外,矩阵甚至可以位于不参与计算的GPU设备上。
与cuBLAS API相反,即使所有矩阵都位于同一设备上,从主机的角度来看,cuBLASXt API仍然是一个阻塞API:无论位于何处的数据结果在呼叫返回时都将是有效的,并且不需要设备同步。

Hybrid CPU-GPU computation

在出现非常大的问题时,cuBLASXt API可以将部分计算卸载到主机CPU。此功能可通过cublasXtSetCpuRoutine()和cublasXtSetCpuRatio()例程设置。影响CPU的工作负载被搁置:它仅仅是从底部和右侧取的所得矩阵的百分比,无论哪个维度较大。GPU平铺是在这之后在减少的结果矩阵上完成的。
如果任何矩阵位于GPU设备上,则将忽略该功能,并且所有计算都将仅在GPU上完成
应谨慎使用此功能,因为它可能会干扰负责为GPU提供数据的CPU线程。
目前,只有cublasXtgemm()例程支持此特性。

Results reproducibility

当前,给定工具包版本中的所有CUBLAS XT API例程在满足以下条件时生成相同的按位结果:

  • 参与计算的所有GPU具有相同的计算能力和相同数量的SM。
  • 在运行之间块尺寸保持相同。
  • 或者不使用CPU混合计算,或者也保证所提供的CPUBlas产生可再现的结果。

 cuBLASXt API数据类型

cublasXtHandle_ t

 cublasXtHandle_t是指向保存cuBLASXt API上下文的不透明结构的指针类型。必须使用以下命令初始化cublasXtHandle_t 并且返回的句柄必须传递给所有后续的cuBLASXt API函数调用。上下文应在最后使用cublasXtDestroy()​​​​​​​ 。

 cublasXtOpType_t

该cublasOptype_t枚举了四种可能得类型,此枚举用作里程的参数cublasXtSetCpuRotine和cublasXtSetCpuRation建立对应额混合的配置。

Value

Meaning

CUBLASXT_FLOAT

浮点或单精度类型

CUBLASXT_DOUBLE

双精度类型

CUBLASXT_COMPLEX

单精度复数

CUBLASXT_DOUBLECOMPLEX

双精度复数

cublasXtBlasOp_t

该 cublasXtBlasOp_t 类型列举了由cuBLASXt API支持的BLAS3或类BLAS3程序。此枚举用作例程的参数cublasXtSetCpuRoutine 以及 cublasXtSetCpuRoutine 以建立混合配

cublasXtSetCpuRoutine

Value

Meaning

CUBLASXT_GEMM

GEMM routine

CUBLASXT_SYRK

SYRK routine

CUBLASXT_HERK

HERK routine

CUBLASXT_SYMM

SYMM routine

CUBLASXT_HEMM

HEMM routine

CUBLASXT_TRSM

TRSM routine

CUBLASXT_SYR2K

SYR2K routine

CUBLASXT_HER2K

HER2K routine

CUBLASXT_SPMM

SPMM routine

CUBLASXT_SYRKX

SYRKX routine

CUBLASXT_HERKX

HERKX routine

cublasXtPinningMemMode_t

该类型用于通过例程启用或禁用固定存储器模式cubasMgSetPinningMemMode

Value

Meaning

CUBLASXT_PINNING_DISABLED

the Pinning Memory mode is disabled

CUBLASXT_PINNING_ENABLED

the Pinning Memory mode is enabled

cuBLASXt API Helper Function Reference

cublasXtCreate()

cublasStatus_t
cublasXtCreate(cublasXtHandle_t *handle)

此函数用于初始化cuBLASXt API,并为保存cuBLASXt API上下文的不透明结构创建句柄。它分配主机和设备上的硬件资源,必须在进行任何其他cuBLASXt API调用之前调用。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the initialization succeeded

CUBLAS_STATUS_ALLOC_FAILED

the resources could not be allocated

CUBLAS_STATUS_NOT_SUPPORTED

cuBLASXt API is only supported on 64-bit platform

 cublasXtDestroy()

cublasStatus_t
cublasXtDestroy(cublasXtHandle_t handle)

此函数用于释放cuBLASXt API上下文使用的硬件资源。GPU资源的释放可以被延迟直到应用退出。此函数通常是对cuBLASXt API的最后一次调用,具有特定句柄。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the shut down succeeded

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

 cublasXtDeviceSelect()

cublasXtDeviceSelect(cublasXtHandle_t handle, int nbDevices, int deviceId[])

此函数允许用户提供将参与后续cuBLASXt API数学函数调用的GPU设备数量及其各自的ID。此函数将为列表中提供的每个GPU创建一个cuBLAS上下文。当前设备配置是静态的,不能在Math函数调用之间更改。在这方面,此函数应仅在cublasXtCreate之后调用一次。为了能够运行多个配置,应创建多个cuBLASXt API上下文。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

User call was sucessful

CUBLAS_STATUS_INVALID_VALUE

Access to at least one of the device could not be done or a cuBLAS context could not be created on at least one of the device

CUBLAS_STATUS_ALLOC_FAILED

Some resources could not be allocated.

 cublasXtSetBlockDim()

cublasXtSetBlockDim(cublasXtHandle_t handle, int blockDim)

此函数允许用户设置用于后续Math函数调用的矩阵平铺的块维数。矩阵拆分为blockDim x blockDim维度的正方形块。此函数可随时调用,并将对以下Math函数调用生效。块尺寸的选择应优化数学运算,并确保PCI传输与计算良好重叠。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

CUBLAS_STATUS_INVALID_VALUE

blockDim <= 0

cublasXtGetBlockDim()

cublasXtGetBlockDim(cublasXtHandle_t handle, int *blockDim)

此函数允许用户查询用于矩阵平铺的块尺寸。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

cublasXtSetCpuRoutine()

cublasXtSetCpuRoutine(cublasXtHandle_t handle, cublasXtBlasOp_t blasOp, cublasXtOpType_t type, void *blasFunctor)

此函数允许用户提供相应BLAS例程的CPU实现。此函数可与cublasXtSetCpuRatio()函数一起使用,以定义CPU和GPU之间的混合计算。目前,仅xGEMM例程支持混合功能。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

CUBLAS_STATUS_INVALID_VALUE

blasOp or type define an invalid combination

CUBLAS_STATUS_NOT_SUPPORTED

CPU-GPU Hybridization for that routine is not supported

cublasXtSetCpuRatio()

cublasXtSetCpuRatio(cublasXtHandle_t handle, cublasXtBlasOp_t blasOp, cublasXtOpType_t type, float ratio )

此函数允许用户定义在混合计算环境中应在CPU上完成的工作负载百分比。此函数可与cublasXtSetCpuRoutine()函数一起使用,以定义CPU和GPU之间的混合计算。目前,仅xGEMM例程支持混合功能。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

CUBLAS_STATUS_INVALID_VALUE

blasOp or type define an invalid combination

CUBLAS_STATUS_NOT_SUPPORTED

CPU-GPU Hybridization for that routine is not supported

 cublasXtSetPinningMemMode()

cublasXtSetPinningMemMode(cublasXtHandle_t handle, cublasXtPinningMemMode_t mode)

此功能允许用户启用或禁用固定的储器模式。启用后,如果矩阵尚未固定,则将分别使用CUDART例程cudaHostRegister和cudaHostUnregister固定/取消固定后续cuBLASXt API调用中传递的矩阵。如果矩阵碰巧被部分固定,则它也不会被固定。固定内存可提高PCI传输性能,并允许PCI内存传输与计算重叠。然而,固定/取消固定内存需要一些时间,这可能不会摊销。建议用户使用cudaMallocHost或cudaHostRegister自行固定存储器,并在计算序列完成时将其解锁。默认情况下,“固定内存”模式处于禁用状态。

当用于不同cuBLASXt API调用的矩阵重叠时,不应启用固定内存模式。如果使用cudaHostGetFlags固定了矩阵的第一个地址,则cuBLASXt确定该矩阵是否固定,因此无法知道该矩阵是否已经部分固定。这在多线程应用程序中尤其如此,在多线程应用程序中,当另一个线程正在访问内存时,内存可能会部分或全部被固定或取消固定。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

CUBLAS_STATUS_INVALID_VALUE

the mode value is different from CUBLASXT_PINNING_DISABLED and CUBLASXT_PINNING_ENABLED

 cublasXtGetPinningMemMode()

cublasXtGetPinningMemMode(cublasXtHandle_t handle,cublasXtPinningMemMode_t *mode)

此功能允许用户查询引固定储器模式。默认情况下,“固定内存”模式处于禁用状态。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

cuBLASXt API Math Functions Reference

在本章中,我们将介绍cuBLASXt API支持的实际Linear Agebra例程。我们将使用type和相应的短类型的缩写,以便更简洁和清楚地表示所实现的函数。除非另有说明具有下列含义:

Meaning

float

‘s’ or ‘S’

real single-precision

double

‘d’ or ‘D’

real double-precision

cuComplex

‘c’ or ‘C’

complex single-precision

cuDoubleComplex

‘z’ or ‘Z’

complex double-precision

 cublasXtgemm()

cublasStatus_t cublasXtSgemm(cublasXtHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,size_t m, size_t n, size_t k,const float           *alpha,const float           *A, int lda,const float           *B, int ldb,const float           *beta,float           *C, int ldc)
cublasStatus_t cublasXtDgemm(cublasXtHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const double          *alpha,const double          *A, int lda,const double          *B, int ldb,const double          *beta,double          *C, int ldc)
cublasStatus_t cublasXtCgemm(cublasXtHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *B, int ldb,const cuComplex       *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasXtZgemm(cublasXtHandle_t handle,cublasOperation_t transa, cublasOperation_t transb,int m, int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *B, int ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, int ldc)

此函数执行矩阵-矩阵乘法

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

transa

input

operation op(A) that is non- or (conj.) transpose.

transb

input

operation op(B) that is non- or (conj.) transpose.

m

input

number of rows of matrix op(A) and C.

n

input

number of columns of matrix op(B) and C.

k

input

number of columns of op(A) and rows of op(B).

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimensions lda x k with lda>=max(1,m) if transa == CUBLAS_OP_N and lda x m with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store the matrix A.

B

host or device

input

array of dimension ldb x n with ldb>=max(1,k) if transb == CUBLAS_OP_N and ldb x k with ldb>=max(1,n) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

scalar used for multiplication. If beta==0C does not have to be a valid input.

C

host or device

in/out

array of dimensions ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of a two-dimensional array used to store the matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXthemm()

cublasStatus_t cublasXtChemm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,size_t m, size_t n,const cuComplex       *alpha,const cuComplex       *A, size_t lda,const cuComplex       *B, size_t ldb,const cuComplex       *beta,cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZhemm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,size_t m, size_t n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, size_t lda,const cuDoubleComplex *B, size_t ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, size_t ldc)

此函数执行厄米特矩阵-矩阵乘法

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

m

input

number of rows of matrix C and B, with matrix A sized accordingly.

n

input

number of columns of matrix C and B, with matrix A sized accordingly.

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimension lda x m with lda>=max(1,m) if side==CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise. The imaginary parts of the diagonal elements are assumed to be zero.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

array of dimensions ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXtsymm()

cublasStatus_t cublasXtSsymm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,size_t m, size_t n,const float           *alpha,const float           *A, size_t lda,const float           *B, size_t ldb,const float           *beta,float           *C, size_t ldc)
cublasStatus_t cublasXtDsymm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,size_t m, size_t n,const double          *alpha,const double          *A, size_t lda,const double          *B, size_t ldb,const double          *beta,double          *C, size_t ldc)
cublasStatus_t cublasXtCsymm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,size_t m, size_t n,const cuComplex       *alpha,const cuComplex       *A, size_t lda,const cuComplex       *B, size_t ldb,const cuComplex       *beta,cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZsymm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,size_t m, size_t n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, size_t lda,const cuDoubleComplex *B, size_t ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, size_t ldc)

此函数执行对称矩阵-矩阵乘法

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.

m

input

number of rows of matrix A and B, with matrix A sized accordingly.

n

input

number of columns of matrix C and A, with matrix A sized accordingly.

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

scalar used for multiplication, if beta == 0 then C does not have to be a valid input.

C

host or device

in/out

array of dimension ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXtsyrk()

cublasStatus_t cublasXtSsyrk(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const float           *alpha,const float           *A, int lda,const float           *beta,float           *C, int ldc)
cublasStatus_t cublasXtDsyrk(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const double          *alpha,const double          *A, int lda,const double          *beta,double          *C, int ldc)
cublasStatus_t cublasXtCsyrk(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasXtZsyrk(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *beta,cuDoubleComplex *C, int ldc)

此函数执行对称秩- K

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix C lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or transpose.

n

input

number of rows of matrix op(A) and C.

k

input

number of columns of matrix op(A).

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimension lda x k with lda>=max(1,n) if trans == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

beta

host

input

scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

array of dimension ldc x n, with ldc>=max(1,n).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXtsyr2k()

cublasStatus_t cublasXtSsyr2k(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const float           *alpha,const float           *A, size_t lda,const float           *B, size_t ldb,const float           *beta,float           *C, size_t ldc)
cublasStatus_t cublasXtDsyr2k(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const double          *alpha,const double          *A, size_t lda,const double          *B, size_t ldb,const double          *beta,double          *C, size_t ldc)
cublasStatus_t cublasXtCsyr2k(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const cuComplex       *alpha,const cuComplex       *A, size_t lda,const cuComplex       *B, size_t ldb,const cuComplex       *beta,cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZsyr2k(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, size_t lda,const cuDoubleComplex *B, size_t ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, size_t ldc)

This function performs the symmetric rank- 2 update

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix C lower or upper part, is stored, the other symmetric part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

array of dimensions ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

scalar used for multiplication, if beta==0, then C does not have to be a valid input.

C

host or device

in/out

array of dimensions ldc x n with ldc>=max(1,n).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXtsyrkx()

cublasStatus_t cublasXtSsyrkx(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const float           *alpha,const float           *A, size_t lda,const float           *B, size_t ldb,const float           *beta,float           *C, size_t ldc)
cublasStatus_t cublasXtDsyrkx(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const double          *alpha,const double          *A, size_t lda,const double          *B, size_t ldb,const double          *beta,double          *C, size_t ldc)
cublasStatus_t cublasXtCsyrkx(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const cuComplex       *alpha,const cuComplex       *A, size_t lda,const cuComplex       *B, size_t ldb,const cuComplex       *beta,cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZsyrkx(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, size_t lda,const cuDoubleComplex *B, size_t ldb,const cuDoubleComplex *beta,cuDoubleComplex *C, size_t ldc)

This function performs a variation of the symmetric rank- k update

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix C lower or upper part, is stored, the other symmetric part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

array of dimensions ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

scalar used for multiplication, if beta==0, then C does not have to be a valid input.

C

host or device

in/out

array of dimensions ldc x n with ldc>=max(1,n).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXtherk()

cublasStatus_t cublasXtCherk(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const float  *alpha,const cuComplex       *A, int lda,const float  *beta,cuComplex       *C, int ldc)
cublasStatus_t cublasXtZherk(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,int n, int k,const double *alpha,const cuDoubleComplex *A, int lda,const double *beta,cuDoubleComplex *C, int ldc)

This function performs the Hermitian rank- K update

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A) and C.

k

input

number of columns of matrix op(A).

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

beta

host

input

scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXther2k()

cublasStatus_t cublasXtCher2k(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const cuComplex       *alpha,const cuComplex       *A, size_t lda,const cuComplex       *B, size_t ldb,const float  *beta,cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZher2k(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, size_t lda,const cuDoubleComplex *B, size_t ldb,const double *beta,cuDoubleComplex *C, size_t ldc)

此函数执行埃尔米特秩- 2 更新

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

array of dimension ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXtherkx()

cublasStatus_t cublasXtCherkx(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const cuComplex       *alpha,const cuComplex       *A, size_t lda,const cuComplex       *B, size_t ldb,const float  *beta,cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZherkx(cublasXtHandle_t handle,cublasFillMode_t uplo, cublasOperation_t trans,size_t n, size_t k,const cuDoubleComplex *alpha,const cuDoubleComplex *A, size_t lda,const cuDoubleComplex *B, size_t ldb,const double *beta,cuDoubleComplex *C, size_t ldc)

这个函数执行埃尔米特秩-x更新

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host

input

scalar used for multiplication.

A

host or device

input

array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

array of dimension ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

real scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXttrsm()

cublasStatus_t cublasXtStrsm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasXtDiagType_t diag,size_t m, size_t n,const float           *alpha,const float           *A, size_t lda,float           *B, size_t ldb)
cublasStatus_t cublasXtDtrsm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasXtDiagType_t diag,size_t m, size_t n,const double          *alpha,const double          *A, size_t lda,double          *B, size_t ldb)
cublasStatus_t cublasXtCtrsm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasXtDiagType_t diag,size_t m, size_t n,const cuComplex       *alpha,const cuComplex       *A, size_t lda,cuComplex       *B, size_t ldb)
cublasStatus_t cublasXtZtrsm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasXtDiagType_t diag,size_t m, size_t n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, size_t lda,cuDoubleComplex *B, size_t ldb)

This function solves the triangular linear system with multiple right-hand-sides

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of X.

uplo

input

indicates if matrix A lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

diag

input

indicates if the elements on the main diagonal of matrix A are unity and should not be accessed.

m

input

number of rows of matrix B, with matrix A sized accordingly.

n

input

number of columns of matrix B, with matrix A is sized accordingly.

alpha

host

input

scalar used for multiplication, if alpha==0 then A is not referenced and B does not have to be a valid input.

A

host or device

input

array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

in/out

array. It has dimensions ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXttrmm()

cublasStatus_t cublasXtStrmm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,size_t m, size_t n,const float           *alpha,const float           *A, size_t lda,const float           *B, size_t ldb,float                 *C, size_t ldc)
cublasStatus_t cublasXtDtrmm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,size_t m, size_t n,const double          *alpha,const double          *A, size_t lda,const double          *B, size_t ldb,double                *C, size_t ldc)
cublasStatus_t cublasXtCtrmm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,size_t m, size_t n,const cuComplex       *alpha,const cuComplex       *A, size_t lda,const cuComplex       *B, size_t ldb,cuComplex             *C, size_t ldc)
cublasStatus_t cublasXtZtrmm(cublasXtHandle_t handle,cublasSideMode_t side, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,size_t m, size_t n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, size_t lda,const cuDoubleComplex *B, size_t ldb,cuDoubleComplex       *C, size_t ldc)

此函数执行三角矩阵-矩阵乘法

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

diag

input

indicates if the elements on the main diagonal of matrix A are unity and should not be accessed.

m

input

number of rows of matrix B, with matrix A sized accordingly.

n

input

number of columns of matrix B, with matrix A sized accordingly.

alpha

host

input

scalar used for multiplication, if alpha==0 then A is not referenced and B does not have to be a valid input.

A

host or device

input

array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

C

host or device

in/out

array of dimension ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXtspmm()

cublasStatus_t cublasXtSspmm( cublasXtHandle_t handle,cublasSideMode_t side,cublasFillMode_t uplo,size_t m,size_t n,const float *alpha,const float *AP,const float *B,size_t ldb,const float *beta,float *C,size_t ldc );cublasStatus_t cublasXtDspmm( cublasXtHandle_t handle,cublasSideMode_t side,cublasFillMode_t uplo,size_t m,size_t n,const double *alpha,const double *AP,const double *B,size_t ldb,const double *beta,double *C,size_t ldc );cublasStatus_t cublasXtCspmm( cublasXtHandle_t handle,cublasSideMode_t side,cublasFillMode_t uplo,size_t m,size_t n,const cuComplex *alpha,const cuComplex *AP,const cuComplex *B,size_t ldb,const cuComplex *beta,cuComplex *C,size_t ldc );cublasStatus_t cublasXtZspmm( cublasXtHandle_t handle,cublasSideMode_t side,cublasFillMode_t uplo,size_t m,size_t n,const cuDoubleComplex *alpha,const cuDoubleComplex *AP,const cuDoubleComplex *B,size_t ldb,const cuDoubleComplex *beta,cuDoubleComplex *C,size_t ldc );
This function performs the symmetric packed matrix-matrix multiplication

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.

m

input

number of rows of matrix A and B, with matrix A sized accordingly.

n

input

number of columns of matrix C and A, with matrix A sized accordingly.

alpha

host

input

scalar used for multiplication.

AP

host

input

array with � stored in packed format.

B

host or device

input

array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

scalar used for multiplication, if beta == 0 then C does not have to be a valid input.

C

host or device

in/out

array of dimension ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_NOT_SUPPORTED

the matrix AP is located on a GPU device

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

相关内容

热门资讯

游戏“第一省”,坐不住了! 游... 文/冯玲玲游戏大省广东坐不住了。近日,广东发布《关于推动广东网络游戏产业高质量发展的若干政策措施》(...
并购重组预期加持!宜宾纸业股价... 本报(chinatimes.net.cn)记者何一华 李未来 北京报道宜宾纸业(600793.SH)...
德邦证券董事会大洗牌:梁雷任董... 新京报贝壳财经讯(记者胡萌)5月30日,德邦证券公示新一届董事会、监事会人员,公司新一任董事长由山东...
原创 成... 今年以来,成都的舞厅经历了比较长时间的整顿,多数舞厅在5月8日获准重新开门营业,到现在已经稳定运行了...
“小雨伞”母公司手回集团港股上... 5月30日,手回集团(2621.HK)在港交所上市。根据手回集团此前披露,此次IPO,手回集团发行2...
决策曲线拆解分析兼随机森林DC... 临床决策曲线(DCA)解析兼绘制随机森林的DCA曲线(R&...
异动快报:海格通信(00246... 证券之星5月30日盘中消息,13点45分海格通信(002465)触及涨停板。目前价格13.89,上涨...
商丘,三线城市!排名第29位! 第一财经·新一线城市研究所5月28日发布《2025新一线城市魅力排行榜》,在中国内地337座地级及以...
传统药企转型面临两难选择?放弃... 近日誉衡药业(002437.SZ)公告称,公司于2025年5月23日与兴和制药有限公司就佩玛贝特片签...
版权代理吃相太难看,连作者都觉... 一家可能都没授权资格的公司,居然也敢向自媒体发律师函讨要版权费。这事儿听起来是不是有点离谱了?最近,...
V观财报|*ST京蓝因涉嫌信披...   中新经纬5月30日电 30日晚,*ST京蓝公告,收到立案告知书。  公告显示,因涉嫌信息披露违法...
Java基础--日期API学习 一、前言         java标准库中,最早提供了两种处理日期和时间的类ÿ...
ES调试与优化工作笔记 本文主要涉及关于elastcisearch 关于内存,mapping,查...
两券商IT人员曝出老鼠仓,一位... 财联社5月30日讯(记者 高艳云)5月30日,安徽证监局与吉林证监局同日披露罚单,两名券商资深IT人...
Labubu太火了!英国人为抢... 最近在英国,有一样毛绒玩具红到了“出圈”,甚至让人忍不住怀疑:这到底是抢玩具,还是在打仗?这里,说的...
MySQL-分库分表方案 一、业务背景 随着业务量的增长,数据量会随之增长,单机情况下DB服务器会...
堆、堆排序 堆的基本操作操作:         1、插入一个数:          ...
广东1130亿饮料富豪,第二个... 来源 | 深蓝财经撰文 | 杨波近日,东鹏饮料赴港上市的消息持续引发关注。一个市值超1600亿,手握...
赵小中连任,长沙银行还有道“考... 文丨徐风5月21日,长沙银行在召开股东大会的同时完成了董事会的换届选举,赵小中当选第八届董事长,实现...
嘉应制药遭证监会立案调查,直指... 5月28日晚间,老牌药企广东嘉应制药股份有限公司(002198.SZ,下称“嘉应制药”)发布公告,公...
腾讯三大工具:ARC+智影+E... ARC实验室(网站) ARC官网-腾讯 (tencent.com)  ...
个人小站折腾后记 个人小站折腾后记 🏠个人主页:shark-Gao 🧑个...
华夏银行聘任龚伟华为首席信息官... 作者 | 林秋彤编辑 | 杨希新媒体编辑丨实习生 宋语菡5月30日,华夏银行发布公告称,吴永飞因到龄...
ST百利:收到湖南证监局《行政... 5月30日晚间,湖南百利工程科技股份有限公司(ST百利,603959.SH)公告,5月29日,公司收...
天元宠物重大资产重组,最大受益... “宠物代工龙头”天元宠物披露了收购案的最新进展。5月29日晚间,杭州天元宠物用品股份有限公司(下称“...
上手Servlet程序 目录 1、手动打包 1.1、创建项目 1.2、引入依赖 1.3、创建目录 1.4、编写代码 1.5、...
第一次认真周赛总结 T1:一个 整数的 二进制形式中 奇数位上bit==1 和 偶数 位上bit=...
1年进账270亿,深圳中药龙头... 最近,华润三九陷入了“悲喜交加”之中。喜的是,2024年华润三九的营收、净利润皆创新高,营收为276...
论文阅读:MPViT : Mu... 中文标题:基于多路视觉Transformer的密集预测 提出问题 创新点 提出了一种...
前安克创新创始人创业公司冲刺I... 近期,深圳智岩科技股份有限公司(以下简称“智岩科技”)正式启动上市辅导,辅导机构为中金公司。智岩科技...