Make CSRMatrix GPU compatible by EmilyBourne · Pull Request #244 · SciCompMod/GMGPolar

EmilyBourne · 2026-05-04T16:14:53Z

Merge Request - GuideLine Checklist

Guideline to check code before resolve WIP and approval, respectively.
As many checkboxes as possible should be ticked.

Checks by code author:

Always to be checked:

There is at least one issue associated with the pull request.
New code adheres with the coding guidelines
No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

If functions were changed or functionality was added:

Tests for new functionality has been added
A local test was succesful

If new functionality was added:

There is appropriate documentation of your work. (use doxygen style comments)

If new third party software is used:

Did you pay attention to its license? Please remember to add it to the wiki after successful merging.

If new mathematical methods or epidemiological terms are used:

Are new methods referenced? Did you provide further documentation?

Checks by code reviewer(s):

Is the code clean of development artifacts e.g., unnecessary comments, prints, ...
The ticket goals for each associated issue are reached or problems are clearly addressed (i.e., a new issue was introduced).
There are appropriate unit tests and they pass.
The git history is clean and linearized for the merge request. All reviewers should squash commits and write a simple and meaningful commit message.
Coverage report for new code is acceptable.
No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

codecov · 2026-05-06T08:20:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.74%. Comparing base (5952563) to head (f1bd270).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
+ Coverage   90.72%   90.74%   +0.02%     
==========================================
  Files          86       86              
  Lines        9473     9499      +26     
==========================================
+ Hits         8594     8620      +26     
  Misses        879      879

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

julianlitz

julianlitz · 2026-05-06T19:58:36Z

We can also just define the COO/CSR similar to the batched tridiagonal solver.

There using AllocatableVector is enough to make it Kokkos compliant.

EmilyBourne · 2026-05-07T06:29:57Z

We can also just define the COO/CSR similar to the batched tridiagonal solver.

There using AllocatableVector is enough to make it Kokkos compliant.

That's just because we haven't started calling it from kokkos loops yet. Once we do we will see that functions such as:

GMGPolar/include/LinearAlgebra/Solvers/tridiagonal_solver.h

Lines 75 to 79 in 96be62d

    
           KOKKOS_INLINE_FUNCTION 
        
           T& cyclic_corner(const int batch_idx) 
        
           { 
        
               return sub_diagonal_(batch_idx * matrix_dimension_ + (matrix_dimension_ - 1)); 
        
           }

will become uncallable (as objects are captured by const copy). At that point the only way to distinguish between a getter and a setter is by changing the internal type as I have done here

julianlitz · 2026-05-07T06:36:35Z

In #239 I already capture the tridiagonal batched matrices by copy in the kokkos lambda and it works

julianlitz · 2026-05-07T06:38:09Z

It is probably not necessary to do the const pointer tricks as far as I can tell.

Dont supply a supply custom copy or move semantics and using AllocatableVector in the members and using const setters and getter seems to work fine.

The const pointer things are not needed right?

julianlitz · 2026-05-07T06:42:40Z

EmilyBourne · 2026-05-07T07:10:18Z

In #239 I already capture the tridiagonal batched matrices by copy in the kokkos lambda and it works

Ah yes this should work too. If you have an explicitly const getter and setter then there is no ambiguity. The design that I propose here is intended to minimise the number of changes in the rest of the code and keep readability there. So we keep mat.row_nz_entry(2, 3) += 4 syntax instead of needing a more verbose mat.set_row_nz_entry(2, 3, mat.row_nz_entry(2, 3) + 4). It also stays closer to the standard way of expressing vector constness in Kokkos.

The decision between the 2 options comes down to:

How much code would we need to modify? (if there are only a couple of calls to these getters/setters then it is probably not worth the loss of readability in the matrix class itself)
What kind of code people are used to writing and what errors they are likely to make.
- With this solution the potential risks/disadvantages are:
  - Accidental shallow copy instead of deep copy by writing SparseMatrixCSR<double> m2(m1); (note that SparseMatrixCSR<double> m2 = m1; will not compile so if this is what most people write then this risk is minimal)
  - The matrix code could be harder to read
  - People may forget to write SparseMatrixCSR<const double> leading to non-const objects where consts are wanted
  - 2x as many possible template arguments can slow down compilation
- With your solution the potential risks/disadvantages are:
  - Non-GPU compatible code being added in the future that needs to be repeatedly corrected. This would be fixed by removing the host-only setter
  - More verbose code could be harder to read
  - It is not possible to enforce constness
    - On main to enforce constness const SparseMatrixCSR<double> prevents the setter being called
    - On this branch to enforce constness SparseMatrixCSR<const double> prevents the setter being called
    - With your solution the setter can always be called

EmilyBourne · 2026-05-07T07:12:25Z

Dont supply a supply custom copy or move semantics and using AllocatableVector in the members and using const setters and getter seems to work fine.

The custom move semantics are fine. They are not used by Kokkos

julianlitz · 2026-05-07T23:22:20Z

Small comments improvements:

#pragma once

#include <algorithm>
#include <cassert>
#include <functional>
#include <limits>
#include <memory>
#include <omp.h>
#include <optional>
#include <sstream>
#include <tuple>
#include <unistd.h>
#include <vector>
#include <unordered_map>
#include <cassert>
#include <iostream>
#include <cmath>

#include <fstream>
#include <iostream>

#include "../../LinearAlgebra/Vector/vector.h"
#include "../../LinearAlgebra/Vector/vector_operations.h"

// The CSR matrix format is used if a custom LU decomposition solver is choosen.
// MUMPS relies on the COO format.

namespace gmgpolar
{

template <typename T>
class SparseMatrixCSR
{
public:
    using non_const_element_type = std::remove_const_t<T>;
    using int_element_type       = std::conditional_t<std::is_const_v<T>, const int, int>;

    using const_type     = SparseMatrixCSR<const T>;
    using non_const_type = SparseMatrixCSR<non_const_element_type>;

    using triplet_type = std::tuple<int, int, non_const_element_type>;

    // Default constructor.
    SparseMatrixCSR();

    // Copy constructor — A shallow copy is intentional here
    KOKKOS_DEFAULTED_FUNCTION SparseMatrixCSR(const SparseMatrixCSR& other) = default;

    // Move constructor — takes ownership instead of sharing.
    KOKKOS_DEFAULTED_FUNCTION SparseMatrixCSR(SparseMatrixCSR&& other) noexcept;

    // Converting constructor: SparseMatrixCSR<T2> -> SparseMatrixCSR<const T2>.
    // This constructor only exists for the instantiation SparseMatrixCSR<const double>,
    // and accepts a SparseMatrixCSR<double> argument.
    // The resulting const matrix shares storage with the source but exposes
    // only read-only access through T and int_element_type.
    template <typename T2,
              std::enable_if_t<std::is_same_v<non_const_element_type, T2> && std::is_const_v<T>, bool> = true>
    KOKKOS_FUNCTION SparseMatrixCSR(const SparseMatrixCSR<T2>& other);

    explicit SparseMatrixCSR(int rows, int columns, std::function<int(int)> nz_per_row);
    explicit SparseMatrixCSR(int rows, int columns, const std::vector<triplet_type>& entries);
    explicit SparseMatrixCSR(int rows, int columns, const std::vector<non_const_element_type>& values,
                             const std::vector<int>& column_indices, const std::vector<int>& row_start_indices);

    // Copy assignment is deleted.
    // Copying is only allowed through the copy constructor or copy() below,
    // making ownership transfer explicit and preventing accidental shallow
    // copies that silently share mutable storage.
    SparseMatrixCSR& operator=(const SparseMatrixCSR& other) = delete;
    // Move assignment — transfers ownership.
    KOKKOS_DEFAULTED_FUNCTION SparseMatrixCSR& operator=(SparseMatrixCSR&& other) noexcept = default;

    // Converting assignment: SparseMatrixCSR<double> -> SparseMatrixCSR<const double>.
    template <typename T2,
              std::enable_if_t<std::is_same_v<std::remove_const_t<T>, T2> && std::is_const_v<T>, bool> = true>
    KOKKOS_FUNCTION SparseMatrixCSR& operator=(const SparseMatrixCSR<T2>& other);

    non_const_type copy() const;
    const_type to_const() const;

    KOKKOS_FUNCTION int rows() const;
    KOKKOS_FUNCTION int columns() const;
    KOKKOS_FUNCTION int non_zero_size() const;
    KOKKOS_FUNCTION int row_nz_size(int row) const;

    KOKKOS_FUNCTION int_element_type& row_nz_index(int row, int nz_index) const;
    KOKKOS_FUNCTION T& row_nz_entry(int row, int nz_index) const;

    KOKKOS_FUNCTION T* values_data() const;
    KOKKOS_FUNCTION int_element_type* column_indices_data() const;
    KOKKOS_FUNCTION int_element_type* row_start_indices_data() const;

    template <typename U>
    friend std::ostream& operator<<(std::ostream& stream, const SparseMatrixCSR<U>& matrix);

    template <typename T2>
    friend class SparseMatrixCSR;

private:
    int rows_;
    int columns_;
    int nnz_;
    AllocatableVector<T> values_;
    AllocatableVector<int_element_type> column_indices_;
    AllocatableVector<int_element_type> row_start_indices_;

    bool is_sorted_entries(const std::vector<std::tuple<int, int, non_const_element_type>>& entries)
    {
        for (size_t i = 1; i < entries.size(); ++i) {
            const auto& prev = entries[i - 1];
            const auto& curr = entries[i];
            if (std::get<0>(prev) > std::get<0>(curr))
                return false;
        }
        return true;
    }
};

template <typename U>
std::ostream& operator<<(std::ostream& stream, const SparseMatrixCSR<U>& matrix)
{
    // Create host mirrors and deep_copy from device if necessary.
    // If the View is already on host, deep_copy is a no-op.
    auto h_values            = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, matrix.values_);
    auto h_column_indices    = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, matrix.column_indices_);
    auto h_row_start_indices = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, matrix.row_start_indices_);

    stream << "SparseMatrixCSR: " << matrix.rows_ << " x " << matrix.columns_ << "\n";
    stream << "Number of non-zeros (nnz): " << matrix.nnz_ << "\n";
    stream << "Non-zero elements (row, column, value):\n";
    for (int row = 0; row < matrix.rows_; ++row)
        for (int nnz = h_row_start_indices(row); nnz < h_row_start_indices(row + 1); ++nnz)
            stream << "(" << row << ", " << h_column_indices(nnz) << ", " << h_values(nnz) << ")\n";

    return stream;
}

// default construction
template <typename T>
SparseMatrixCSR<T>::SparseMatrixCSR()
    : rows_(0)
    , columns_(0)
    , nnz_(0)
{
}

// copy construction
template <typename T>
SparseMatrixCSR<T>::non_const_type SparseMatrixCSR<T>::copy() const
{
    non_const_type new_copy;
    new_copy.rows_              = rows_;
    new_copy.columns_           = columns_;
    new_copy.nnz_               = nnz_;
    new_copy.values_            = AllocatableVector<non_const_element_type>("CSR values", nnz_);
    new_copy.column_indices_    = AllocatableVector<int>("CSR column indices", nnz_);
    new_copy.row_start_indices_ = AllocatableVector<int>("CSR row start indices", rows_ + 1);
    Kokkos::deep_copy(new_copy.values_, values_);
    Kokkos::deep_copy(new_copy.column_indices_, column_indices_);
    Kokkos::deep_copy(new_copy.row_start_indices_, row_start_indices_);

    return new_copy;
}

// move construction
template <typename T>
KOKKOS_FUNCTION SparseMatrixCSR<T>::SparseMatrixCSR(SparseMatrixCSR&& other) noexcept
    : rows_(other.rows_)
    , columns_(other.columns_)
    , nnz_(other.nnz_)
    , values_(std::move(other.values_))
    , column_indices_(std::move(other.column_indices_))
    , row_start_indices_(std::move(other.row_start_indices_))
{
    other.nnz_     = 0;
    other.rows_    = 0;
    other.columns_ = 0;
}

// const cast construction
template <typename T>
template <typename T2, std::enable_if_t<std::is_same_v<std::remove_const_t<T>, T2> && std::is_const_v<T>, bool>>
KOKKOS_FUNCTION SparseMatrixCSR<T>::SparseMatrixCSR(const SparseMatrixCSR<T2>& other)
    : rows_(other.rows_)
    , columns_(other.columns_)
    , nnz_(other.nnz_)
    , values_(other.values_)
    , column_indices_(other.column_indices_)
    , row_start_indices_(other.row_start_indices_)
{
}

template <typename T>
template <typename T2, std::enable_if_t<std::is_same_v<std::remove_const_t<T>, T2> && std::is_const_v<T>, bool>>
KOKKOS_FUNCTION SparseMatrixCSR<T>& SparseMatrixCSR<T>::operator=(const SparseMatrixCSR<T2>& other)
{
    rows_              = other.rows_;
    columns_           = other.columns_;
    nnz_               = other.nnz_;
    values_            = other.values_;
    column_indices_    = other.column_indices_;
    row_start_indices_ = other.row_start_indices_;
    return *this;
}

template <typename T>
SparseMatrixCSR<T>::SparseMatrixCSR(int rows, int columns, std::function<int(int)> nz_per_row)
    : rows_(rows)
    , columns_(columns)
{
    assert(rows >= 0);
    assert(columns >= 0);

    Vector<int> row_start_indices("CSR row start indices", rows_ + 1);

    nnz_ = 0;
    for (int i = 0; i < rows; i++) {
        row_start_indices(i) = nnz_;
        nnz_ += nz_per_row(i);
    }
    row_start_indices(rows) = nnz_;

    Vector<non_const_element_type> values("CSR values", nnz_);
    Vector<int> column_indices("CSR column indices", nnz_);

    assign(values, T(0));
    assign(column_indices, 0);

    values_            = values;
    column_indices_    = column_indices;
    row_start_indices_ = row_start_indices;
}

template <typename T>
SparseMatrixCSR<T>::SparseMatrixCSR(int rows, int columns, const std::vector<triplet_type>& entries)
    : // entries: row_idx, col_idx, value
    rows_(rows)
    , columns_(columns)
    , nnz_(entries.size())
{
    assert(rows >= 0);
    assert(columns >= 0);
    assert(is_sorted_entries(entries) && "Entries must be sorted by row!");

    Vector<non_const_element_type> values("CSR values", nnz_);
    Vector<int> column_indices("CSR column indices", nnz_);
    Vector<int> row_start_indices("CSR row start indices", rows_ + 1);

    // fill values and column indexes
    for (int i = 0; i < nnz_; i++) {
        assert(0 <= std::get<0>(entries[i]) && std::get<0>(entries[i]) < rows);
        values(i)         = std::get<2>(entries[i]);
        column_indices(i) = std::get<1>(entries[i]);
        assert(0 <= column_indices(i) && column_indices(i) < columns);
    }
    //fill row indexes
    int count            = 0;
    row_start_indices(0) = 0;
    for (int r = 0; r < rows; r++) {
        while (count < nnz_ && std::get<0>(entries[count]) == r)
            count++;
        row_start_indices(r + 1) = count;
    }
    assert(row_start_indices(rows) == nnz_);

    values_            = values;
    column_indices_    = column_indices;
    row_start_indices_ = row_start_indices;
}

template <typename T>
SparseMatrixCSR<T>::SparseMatrixCSR(int rows, int columns, const std::vector<non_const_element_type>& values,
                                    const std::vector<int>& column_indices, const std::vector<int>& row_start_indices)
    : rows_(rows)
    , columns_(columns)
    , nnz_(values.size())
{
    assert(rows >= 0);
    assert(columns >= 0);
    assert(row_start_indices.size() == static_cast<size_t>(rows + 1));
    assert(values.size() == column_indices.size());

    Vector<non_const_element_type> csr_values("CSR values", nnz_);
    Vector<int> csr_column_indices("CSR column indices", nnz_);
    Vector<int> csr_row_start_indices("CSR row start indices", rows_ + 1);

    // Copy data to internal storage
    std::copy(values.begin(), values.end(), csr_values.data());
    std::copy(column_indices.begin(), column_indices.end(), csr_column_indices.data());
    std::copy(row_start_indices.begin(), row_start_indices.end(), csr_row_start_indices.data());

    values_            = csr_values;
    column_indices_    = csr_column_indices;
    row_start_indices_ = csr_row_start_indices;
}

template <typename T>
SparseMatrixCSR<const T> SparseMatrixCSR<T>::to_const() const
{
    return const_type(*this);
}

template <typename T>
KOKKOS_FUNCTION int SparseMatrixCSR<T>::rows() const
{
    assert(this->rows_ >= 0);
    return this->rows_;
}
template <typename T>
KOKKOS_FUNCTION int SparseMatrixCSR<T>::columns() const
{
    assert(this->columns_ >= 0);
    return this->columns_;
}
template <typename T>
KOKKOS_FUNCTION int SparseMatrixCSR<T>::non_zero_size() const
{
    assert(this->nnz_ >= 0);
    return this->nnz_;
}

template <typename T>
KOKKOS_FUNCTION int SparseMatrixCSR<T>::row_nz_size(int row) const
{
    assert(row >= 0 && row < rows_);
    return row_start_indices_(row + 1) - row_start_indices_(row);
}

template <typename T>
KOKKOS_FUNCTION SparseMatrixCSR<T>::int_element_type& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) const
{
    assert(row >= 0 && row < rows_);
    assert(nz_index >= 0 && nz_index < row_nz_size(row));
    return column_indices_(row_start_indices_(row) + nz_index);
}

template <typename T>
KOKKOS_FUNCTION T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) const
{
    assert(row >= 0 && row < rows_);
    assert(nz_index >= 0 && nz_index < row_nz_size(row));
    return values_(row_start_indices_(row) + nz_index);
}

template <typename T>
KOKKOS_FUNCTION T* SparseMatrixCSR<T>::values_data() const
{
    return values_.data();
}

template <typename T>
KOKKOS_FUNCTION SparseMatrixCSR<T>::int_element_type* SparseMatrixCSR<T>::column_indices_data() const
{
    return column_indices_.data();
}

template <typename T>
KOKKOS_FUNCTION SparseMatrixCSR<T>::int_element_type* SparseMatrixCSR<T>::row_start_indices_data() const
{
    return row_start_indices_.data();
}
} // namespace gmgpolar

EmilyBourne · 2026-05-08T07:28:21Z

@julianlitz

Small comments improvements:

I agree with nearly all these changes, but why do you suggest removing an assertion:

assert(static_cast<size_t>(this->nnz_) <= static_cast<size_t>(this->rows_) * static_cast<size_t>(this->columns_));

?

julianlitz · 2026-05-08T07:35:15Z

@julianlitz

Small comments improvements:

I agree with nearly all these changes, but why do you suggest removing an assertion:
assert(static_cast<size_t>(this->nnz_) <= static_cast<size_t>(this->rows_) * static_cast<size_t>(this->columns_));
?

Lots of annoying static cast just to prevent overflows.

EmilyBourne · 2026-05-08T09:23:20Z

@julianlitz I have created #250 to compare the amount of code required between the 2 methods. I am leaning towards #250 in the end. It changes ~100 lines less code. It is a shame that const no longer enforces constness but by only having an explicit setter it should be clear when the matrix is modified.

julianlitz · 2026-05-08T09:43:52Z

@EmilyBourne Sure go for 250 then

EmilyBourne added 5 commits May 5, 2026 16:43

Use default operator= to avoid duplicating constructor logic

a2a6baf

Call copy. Remove const to be able to call r-value constructor

7396450

Handle const T

15b2cce

Allow const cast. Use const as appropriate

55b9687

Add KOKKOS_FUNCTION annotations

96fb608

EmilyBourne force-pushed the ebourne_csr_to_gpu_compat branch from 2132ce1 to 96fb608 Compare May 6, 2026 07:47

EmilyBourne changed the title ~~WIP: Make CSRMatrix GPU compatible~~ Make CSRMatrix GPU compatible May 6, 2026

EmilyBourne marked this pull request as ready for review May 6, 2026 07:48

Clang formatting

f1bd270

EmilyBourne requested a review from julianlitz May 6, 2026 12:20

Put back operator= optimisations

1972c2b

julianlitz requested changes May 6, 2026

View reviewed changes

Delete misleading operator=

44cc0f7

Explicit copy

735aff0

julianlitz self-requested a review May 7, 2026 23:22

julianlitz approved these changes May 7, 2026

View reviewed changes

Add suggested docs. Ensure data is on host before using operator<<

42f9e97

EmilyBourne closed this May 8, 2026

EmilyBourne deleted the ebourne_csr_to_gpu_compat branch May 8, 2026 13:52

Conversation

EmilyBourne commented May 4, 2026

Merge Request - GuideLine Checklist

Checks by code author:

Checks by code reviewer(s):

Uh oh!

codecov Bot commented May 6, 2026

Codecov Report

Uh oh!

julianlitz left a comment

Choose a reason for hiding this comment

Uh oh!

julianlitz commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EmilyBourne commented May 7, 2026

Uh oh!

julianlitz commented May 7, 2026

Uh oh!

julianlitz commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julianlitz commented May 7, 2026

Uh oh!

EmilyBourne commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EmilyBourne commented May 7, 2026

Uh oh!

julianlitz commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EmilyBourne commented May 8, 2026

Uh oh!

julianlitz commented May 8, 2026

Uh oh!

EmilyBourne commented May 8, 2026

Uh oh!

julianlitz commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

julianlitz commented May 6, 2026 •

edited

Loading

julianlitz commented May 7, 2026 •

edited

Loading

EmilyBourne commented May 7, 2026 •

edited

Loading

julianlitz commented May 7, 2026 •

edited

Loading