CUDA Networks
Functions
matrix_transpose.cu File Reference

Implementation of the Matrix::transpose method for GPU-accelerated matrix transposition. More...

#include "matrix.h"
#include <cuda_runtime.h>
#include <stdexcept>
#include <string>
Include dependency graph for matrix_transpose.cu:

Go to the source code of this file.

Functions

__global__ void matrixTransposeKernel (const double *input, double *output, int rows, int cols)
 CUDA kernel for matrix transposition. More...
 

Detailed Description

Implementation of the Matrix::transpose method for GPU-accelerated matrix transposition.

Definition in file matrix_transpose.cu.

Function Documentation

◆ matrixTransposeKernel()

__global__ void matrixTransposeKernel ( const double *  input,
double *  output,
int  rows,
int  cols 
)

CUDA kernel for matrix transposition.

Parameters
inputPointer to the input matrix data.
outputPointer to the output (transposed) matrix data.
rowsNumber of rows in the input matrix.
colsNumber of columns in the input matrix.

Definition at line 18 of file matrix_transpose.cu.

18  {
19  // Calculate global thread indices
20  int row = blockIdx.y * blockDim.y + threadIdx.y;
21  int col = blockIdx.x * blockDim.x + threadIdx.x;
22 
23  // Check if thread is within matrix bounds
24  if (row < rows && col < cols) {
25  // Calculate transposed index and assign value
26  int transposedIdx = col * rows + row;
27  int originalIdx = row * cols + col;
28  output[transposedIdx] = input[originalIdx];
29  }
30 }