Implementation of the Matrix::transpose method for GPU-accelerated matrix transposition. More...

#include "matrix.h"
#include <cuda_runtime.h>
#include <stdexcept>
#include <string>

Include dependency graph for matrix_transpose.cu:

Functions
__global__ void	matrixTransposeKernel (const double input, double output, int rows, int cols)
	CUDA kernel for matrix transposition. More...

Detailed Description

Implementation of the Matrix::transpose method for GPU-accelerated matrix transposition.

Definition in file matrix_transpose.cu.

Function Documentation

◆ matrixTransposeKernel()

__global__ void matrixTransposeKernel	(	const double *	input,
		double *	output,
		int	rows,
		int	cols
	)

CUDA kernel for matrix transposition.

Parameters

input	Pointer to the input matrix data.
output	Pointer to the output (transposed) matrix data.
rows	Number of rows in the input matrix.
cols	Number of columns in the input matrix.

Definition at line 18 of file matrix_transpose.cu.

                                                                                                {
     // Calculate global thread indices
     int row = blockIdx.y * blockDim.y + threadIdx.y;
     int col = blockIdx.x * blockDim.x + threadIdx.x;
  
     // Check if thread is within matrix bounds
     if (row < rows && col < cols) {
         // Calculate transposed index and assign value
         int transposedIdx = col * rows + row;
         int originalIdx = row * cols + col;
         output[transposedIdx] = input[originalIdx];
     }
 }

Functions

Detailed Description

Function Documentation

◆ matrixTransposeKernel()