Skip to content

Make CSRMatrix GPU compatible#244

Closed
EmilyBourne wants to merge 10 commits into
SciCompMod:mainfrom
EmilyBourne:ebourne_csr_to_gpu_compat
Closed

Make CSRMatrix GPU compatible#244
EmilyBourne wants to merge 10 commits into
SciCompMod:mainfrom
EmilyBourne:ebourne_csr_to_gpu_compat

Conversation

@EmilyBourne
Copy link
Copy Markdown
Collaborator

Merge Request - GuideLine Checklist

Guideline to check code before resolve WIP and approval, respectively.
As many checkboxes as possible should be ticked.

Checks by code author:

Always to be checked:

  • There is at least one issue associated with the pull request.
  • New code adheres with the coding guidelines
  • No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

If functions were changed or functionality was added:

  • Tests for new functionality has been added
  • A local test was succesful

If new functionality was added:

  • There is appropriate documentation of your work. (use doxygen style comments)

If new third party software is used:

  • Did you pay attention to its license? Please remember to add it to the wiki after successful merging.

If new mathematical methods or epidemiological terms are used:

  • Are new methods referenced? Did you provide further documentation?

Checks by code reviewer(s):

  • Is the code clean of development artifacts e.g., unnecessary comments, prints, ...
  • The ticket goals for each associated issue are reached or problems are clearly addressed (i.e., a new issue was introduced).
  • There are appropriate unit tests and they pass.
  • The git history is clean and linearized for the merge request. All reviewers should squash commits and write a simple and meaningful commit message.
  • Coverage report for new code is acceptable.
  • No large data files have been added to the repository. Maximum size for files should be of the order of KB not MB. In particular avoid adding of pdf, word, or other files that cannot be change-tracked correctly by git.

@EmilyBourne EmilyBourne force-pushed the ebourne_csr_to_gpu_compat branch from 2132ce1 to 96fb608 Compare May 6, 2026 07:47
@EmilyBourne EmilyBourne changed the title WIP: Make CSRMatrix GPU compatible Make CSRMatrix GPU compatible May 6, 2026
@EmilyBourne EmilyBourne marked this pull request as ready for review May 6, 2026 07:48
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.74%. Comparing base (5952563) to head (f1bd270).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
+ Coverage   90.72%   90.74%   +0.02%     
==========================================
  Files          86       86              
  Lines        9473     9499      +26     
==========================================
+ Hits         8594     8620      +26     
  Misses        879      879              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@EmilyBourne EmilyBourne requested a review from julianlitz May 6, 2026 12:20
Copy link
Copy Markdown
Collaborator

@julianlitz julianlitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@julianlitz
Copy link
Copy Markdown
Collaborator

julianlitz commented May 6, 2026

We can also just define the COO/CSR similar to the batched tridiagonal solver.

There using AllocatableVector is enough to make it Kokkos compliant.

@EmilyBourne
Copy link
Copy Markdown
Collaborator Author

We can also just define the COO/CSR similar to the batched tridiagonal solver.

There using AllocatableVector is enough to make it Kokkos compliant.

That's just because we haven't started calling it from kokkos loops yet. Once we do we will see that functions such as:

KOKKOS_INLINE_FUNCTION
T& cyclic_corner(const int batch_idx)
{
return sub_diagonal_(batch_idx * matrix_dimension_ + (matrix_dimension_ - 1));
}

will become uncallable (as objects are captured by const copy). At that point the only way to distinguish between a getter and a setter is by changing the internal type as I have done here

@julianlitz
Copy link
Copy Markdown
Collaborator

image

In #239 I already capture the tridiagonal batched matrices by copy in the kokkos lambda and it works

@julianlitz
Copy link
Copy Markdown
Collaborator

julianlitz commented May 7, 2026

It is probably not necessary to do the const pointer tricks as far as I can tell.

Dont supply a supply custom copy or move semantics and using AllocatableVector in the members and using const setters and getter seems to work fine.

The const pointer things are not needed right?

@julianlitz
Copy link
Copy Markdown
Collaborator

image

image

@EmilyBourne
Copy link
Copy Markdown
Collaborator Author

EmilyBourne commented May 7, 2026

image

In #239 I already capture the tridiagonal batched matrices by copy in the kokkos lambda and it works

Ah yes this should work too. If you have an explicitly const getter and setter then there is no ambiguity. The design that I propose here is intended to minimise the number of changes in the rest of the code and keep readability there. So we keep mat.row_nz_entry(2, 3) += 4 syntax instead of needing a more verbose mat.set_row_nz_entry(2, 3, mat.row_nz_entry(2, 3) + 4). It also stays closer to the standard way of expressing vector constness in Kokkos.

The decision between the 2 options comes down to:

  • How much code would we need to modify? (if there are only a couple of calls to these getters/setters then it is probably not worth the loss of readability in the matrix class itself)
  • What kind of code people are used to writing and what errors they are likely to make.
    • With this solution the potential risks/disadvantages are:
      • Accidental shallow copy instead of deep copy by writing SparseMatrixCSR<double> m2(m1); (note that SparseMatrixCSR<double> m2 = m1; will not compile so if this is what most people write then this risk is minimal)
      • The matrix code could be harder to read
      • People may forget to write SparseMatrixCSR<const double> leading to non-const objects where consts are wanted
      • 2x as many possible template arguments can slow down compilation
    • With your solution the potential risks/disadvantages are:
      • Non-GPU compatible code being added in the future that needs to be repeatedly corrected. This would be fixed by removing the host-only setter
      • More verbose code could be harder to read
      • It is not possible to enforce constness
        • On main to enforce constness const SparseMatrixCSR<double> prevents the setter being called
        • On this branch to enforce constness SparseMatrixCSR<const double> prevents the setter being called
        • With your solution the setter can always be called

@EmilyBourne
Copy link
Copy Markdown
Collaborator Author

Dont supply a supply custom copy or move semantics and using AllocatableVector in the members and using const setters and getter seems to work fine.

The custom move semantics are fine. They are not used by Kokkos

@julianlitz
Copy link
Copy Markdown
Collaborator

julianlitz commented May 7, 2026

Small comments improvements:

#pragma once

#include <algorithm>
#include <cassert>
#include <functional>
#include <limits>
#include <memory>
#include <omp.h>
#include <optional>
#include <sstream>
#include <tuple>
#include <unistd.h>
#include <vector>
#include <unordered_map>
#include <cassert>
#include <iostream>
#include <cmath>

#include <fstream>
#include <iostream>

#include "../../LinearAlgebra/Vector/vector.h"
#include "../../LinearAlgebra/Vector/vector_operations.h"

// The CSR matrix format is used if a custom LU decomposition solver is choosen.
// MUMPS relies on the COO format.

namespace gmgpolar
{

template <typename T>
class SparseMatrixCSR
{
public:
    using non_const_element_type = std::remove_const_t<T>;
    using int_element_type       = std::conditional_t<std::is_const_v<T>, const int, int>;

    using const_type     = SparseMatrixCSR<const T>;
    using non_const_type = SparseMatrixCSR<non_const_element_type>;

    using triplet_type = std::tuple<int, int, non_const_element_type>;

    // Default constructor.
    SparseMatrixCSR();

    // Copy constructor — A shallow copy is intentional here
    KOKKOS_DEFAULTED_FUNCTION SparseMatrixCSR(const SparseMatrixCSR& other) = default;

    // Move constructor — takes ownership instead of sharing.
    KOKKOS_DEFAULTED_FUNCTION SparseMatrixCSR(SparseMatrixCSR&& other) noexcept;

    // Converting constructor: SparseMatrixCSR<T2> -> SparseMatrixCSR<const T2>.
    // This constructor only exists for the instantiation SparseMatrixCSR<const double>,
    // and accepts a SparseMatrixCSR<double> argument.
    // The resulting const matrix shares storage with the source but exposes
    // only read-only access through T and int_element_type.
    template <typename T2,
              std::enable_if_t<std::is_same_v<non_const_element_type, T2> && std::is_const_v<T>, bool> = true>
    KOKKOS_FUNCTION SparseMatrixCSR(const SparseMatrixCSR<T2>& other);

    explicit SparseMatrixCSR(int rows, int columns, std::function<int(int)> nz_per_row);
    explicit SparseMatrixCSR(int rows, int columns, const std::vector<triplet_type>& entries);
    explicit SparseMatrixCSR(int rows, int columns, const std::vector<non_const_element_type>& values,
                             const std::vector<int>& column_indices, const std::vector<int>& row_start_indices);

    // Copy assignment is deleted.
    // Copying is only allowed through the copy constructor or copy() below,
    // making ownership transfer explicit and preventing accidental shallow
    // copies that silently share mutable storage.
    SparseMatrixCSR& operator=(const SparseMatrixCSR& other) = delete;
    // Move assignment — transfers ownership.
    KOKKOS_DEFAULTED_FUNCTION SparseMatrixCSR& operator=(SparseMatrixCSR&& other) noexcept = default;

    // Converting assignment: SparseMatrixCSR<double> -> SparseMatrixCSR<const double>.
    template <typename T2,
              std::enable_if_t<std::is_same_v<std::remove_const_t<T>, T2> && std::is_const_v<T>, bool> = true>
    KOKKOS_FUNCTION SparseMatrixCSR& operator=(const SparseMatrixCSR<T2>& other);

    non_const_type copy() const;
    const_type to_const() const;

    KOKKOS_FUNCTION int rows() const;
    KOKKOS_FUNCTION int columns() const;
    KOKKOS_FUNCTION int non_zero_size() const;
    KOKKOS_FUNCTION int row_nz_size(int row) const;

    KOKKOS_FUNCTION int_element_type& row_nz_index(int row, int nz_index) const;
    KOKKOS_FUNCTION T& row_nz_entry(int row, int nz_index) const;

    KOKKOS_FUNCTION T* values_data() const;
    KOKKOS_FUNCTION int_element_type* column_indices_data() const;
    KOKKOS_FUNCTION int_element_type* row_start_indices_data() const;

    template <typename U>
    friend std::ostream& operator<<(std::ostream& stream, const SparseMatrixCSR<U>& matrix);

    template <typename T2>
    friend class SparseMatrixCSR;

private:
    int rows_;
    int columns_;
    int nnz_;
    AllocatableVector<T> values_;
    AllocatableVector<int_element_type> column_indices_;
    AllocatableVector<int_element_type> row_start_indices_;

    bool is_sorted_entries(const std::vector<std::tuple<int, int, non_const_element_type>>& entries)
    {
        for (size_t i = 1; i < entries.size(); ++i) {
            const auto& prev = entries[i - 1];
            const auto& curr = entries[i];
            if (std::get<0>(prev) > std::get<0>(curr))
                return false;
        }
        return true;
    }
};

template <typename U>
std::ostream& operator<<(std::ostream& stream, const SparseMatrixCSR<U>& matrix)
{
    // Create host mirrors and deep_copy from device if necessary.
    // If the View is already on host, deep_copy is a no-op.
    auto h_values            = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, matrix.values_);
    auto h_column_indices    = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, matrix.column_indices_);
    auto h_row_start_indices = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace{}, matrix.row_start_indices_);

    stream << "SparseMatrixCSR: " << matrix.rows_ << " x " << matrix.columns_ << "\n";
    stream << "Number of non-zeros (nnz): " << matrix.nnz_ << "\n";
    stream << "Non-zero elements (row, column, value):\n";
    for (int row = 0; row < matrix.rows_; ++row)
        for (int nnz = h_row_start_indices(row); nnz < h_row_start_indices(row + 1); ++nnz)
            stream << "(" << row << ", " << h_column_indices(nnz) << ", " << h_values(nnz) << ")\n";

    return stream;
}

// default construction
template <typename T>
SparseMatrixCSR<T>::SparseMatrixCSR()
    : rows_(0)
    , columns_(0)
    , nnz_(0)
{
}

// copy construction
template <typename T>
SparseMatrixCSR<T>::non_const_type SparseMatrixCSR<T>::copy() const
{
    non_const_type new_copy;
    new_copy.rows_              = rows_;
    new_copy.columns_           = columns_;
    new_copy.nnz_               = nnz_;
    new_copy.values_            = AllocatableVector<non_const_element_type>("CSR values", nnz_);
    new_copy.column_indices_    = AllocatableVector<int>("CSR column indices", nnz_);
    new_copy.row_start_indices_ = AllocatableVector<int>("CSR row start indices", rows_ + 1);
    Kokkos::deep_copy(new_copy.values_, values_);
    Kokkos::deep_copy(new_copy.column_indices_, column_indices_);
    Kokkos::deep_copy(new_copy.row_start_indices_, row_start_indices_);

    return new_copy;
}

// move construction
template <typename T>
KOKKOS_FUNCTION SparseMatrixCSR<T>::SparseMatrixCSR(SparseMatrixCSR&& other) noexcept
    : rows_(other.rows_)
    , columns_(other.columns_)
    , nnz_(other.nnz_)
    , values_(std::move(other.values_))
    , column_indices_(std::move(other.column_indices_))
    , row_start_indices_(std::move(other.row_start_indices_))
{
    other.nnz_     = 0;
    other.rows_    = 0;
    other.columns_ = 0;
}

// const cast construction
template <typename T>
template <typename T2, std::enable_if_t<std::is_same_v<std::remove_const_t<T>, T2> && std::is_const_v<T>, bool>>
KOKKOS_FUNCTION SparseMatrixCSR<T>::SparseMatrixCSR(const SparseMatrixCSR<T2>& other)
    : rows_(other.rows_)
    , columns_(other.columns_)
    , nnz_(other.nnz_)
    , values_(other.values_)
    , column_indices_(other.column_indices_)
    , row_start_indices_(other.row_start_indices_)
{
}

template <typename T>
template <typename T2, std::enable_if_t<std::is_same_v<std::remove_const_t<T>, T2> && std::is_const_v<T>, bool>>
KOKKOS_FUNCTION SparseMatrixCSR<T>& SparseMatrixCSR<T>::operator=(const SparseMatrixCSR<T2>& other)
{
    rows_              = other.rows_;
    columns_           = other.columns_;
    nnz_               = other.nnz_;
    values_            = other.values_;
    column_indices_    = other.column_indices_;
    row_start_indices_ = other.row_start_indices_;
    return *this;
}

template <typename T>
SparseMatrixCSR<T>::SparseMatrixCSR(int rows, int columns, std::function<int(int)> nz_per_row)
    : rows_(rows)
    , columns_(columns)
{
    assert(rows >= 0);
    assert(columns >= 0);

    Vector<int> row_start_indices("CSR row start indices", rows_ + 1);

    nnz_ = 0;
    for (int i = 0; i < rows; i++) {
        row_start_indices(i) = nnz_;
        nnz_ += nz_per_row(i);
    }
    row_start_indices(rows) = nnz_;

    Vector<non_const_element_type> values("CSR values", nnz_);
    Vector<int> column_indices("CSR column indices", nnz_);

    assign(values, T(0));
    assign(column_indices, 0);

    values_            = values;
    column_indices_    = column_indices;
    row_start_indices_ = row_start_indices;
}

template <typename T>
SparseMatrixCSR<T>::SparseMatrixCSR(int rows, int columns, const std::vector<triplet_type>& entries)
    : // entries: row_idx, col_idx, value
    rows_(rows)
    , columns_(columns)
    , nnz_(entries.size())
{
    assert(rows >= 0);
    assert(columns >= 0);
    assert(is_sorted_entries(entries) && "Entries must be sorted by row!");

    Vector<non_const_element_type> values("CSR values", nnz_);
    Vector<int> column_indices("CSR column indices", nnz_);
    Vector<int> row_start_indices("CSR row start indices", rows_ + 1);

    // fill values and column indexes
    for (int i = 0; i < nnz_; i++) {
        assert(0 <= std::get<0>(entries[i]) && std::get<0>(entries[i]) < rows);
        values(i)         = std::get<2>(entries[i]);
        column_indices(i) = std::get<1>(entries[i]);
        assert(0 <= column_indices(i) && column_indices(i) < columns);
    }
    //fill row indexes
    int count            = 0;
    row_start_indices(0) = 0;
    for (int r = 0; r < rows; r++) {
        while (count < nnz_ && std::get<0>(entries[count]) == r)
            count++;
        row_start_indices(r + 1) = count;
    }
    assert(row_start_indices(rows) == nnz_);

    values_            = values;
    column_indices_    = column_indices;
    row_start_indices_ = row_start_indices;
}

template <typename T>
SparseMatrixCSR<T>::SparseMatrixCSR(int rows, int columns, const std::vector<non_const_element_type>& values,
                                    const std::vector<int>& column_indices, const std::vector<int>& row_start_indices)
    : rows_(rows)
    , columns_(columns)
    , nnz_(values.size())
{
    assert(rows >= 0);
    assert(columns >= 0);
    assert(row_start_indices.size() == static_cast<size_t>(rows + 1));
    assert(values.size() == column_indices.size());

    Vector<non_const_element_type> csr_values("CSR values", nnz_);
    Vector<int> csr_column_indices("CSR column indices", nnz_);
    Vector<int> csr_row_start_indices("CSR row start indices", rows_ + 1);

    // Copy data to internal storage
    std::copy(values.begin(), values.end(), csr_values.data());
    std::copy(column_indices.begin(), column_indices.end(), csr_column_indices.data());
    std::copy(row_start_indices.begin(), row_start_indices.end(), csr_row_start_indices.data());

    values_            = csr_values;
    column_indices_    = csr_column_indices;
    row_start_indices_ = csr_row_start_indices;
}

template <typename T>
SparseMatrixCSR<const T> SparseMatrixCSR<T>::to_const() const
{
    return const_type(*this);
}

template <typename T>
KOKKOS_FUNCTION int SparseMatrixCSR<T>::rows() const
{
    assert(this->rows_ >= 0);
    return this->rows_;
}
template <typename T>
KOKKOS_FUNCTION int SparseMatrixCSR<T>::columns() const
{
    assert(this->columns_ >= 0);
    return this->columns_;
}
template <typename T>
KOKKOS_FUNCTION int SparseMatrixCSR<T>::non_zero_size() const
{
    assert(this->nnz_ >= 0);
    return this->nnz_;
}

template <typename T>
KOKKOS_FUNCTION int SparseMatrixCSR<T>::row_nz_size(int row) const
{
    assert(row >= 0 && row < rows_);
    return row_start_indices_(row + 1) - row_start_indices_(row);
}

template <typename T>
KOKKOS_FUNCTION SparseMatrixCSR<T>::int_element_type& SparseMatrixCSR<T>::row_nz_index(int row, int nz_index) const
{
    assert(row >= 0 && row < rows_);
    assert(nz_index >= 0 && nz_index < row_nz_size(row));
    return column_indices_(row_start_indices_(row) + nz_index);
}

template <typename T>
KOKKOS_FUNCTION T& SparseMatrixCSR<T>::row_nz_entry(int row, int nz_index) const
{
    assert(row >= 0 && row < rows_);
    assert(nz_index >= 0 && nz_index < row_nz_size(row));
    return values_(row_start_indices_(row) + nz_index);
}

template <typename T>
KOKKOS_FUNCTION T* SparseMatrixCSR<T>::values_data() const
{
    return values_.data();
}

template <typename T>
KOKKOS_FUNCTION SparseMatrixCSR<T>::int_element_type* SparseMatrixCSR<T>::column_indices_data() const
{
    return column_indices_.data();
}

template <typename T>
KOKKOS_FUNCTION SparseMatrixCSR<T>::int_element_type* SparseMatrixCSR<T>::row_start_indices_data() const
{
    return row_start_indices_.data();
}
} // namespace gmgpolar

@julianlitz julianlitz self-requested a review May 7, 2026 23:22
@EmilyBourne
Copy link
Copy Markdown
Collaborator Author

@julianlitz

Small comments improvements:

I agree with nearly all these changes, but why do you suggest removing an assertion:

assert(static_cast<size_t>(this->nnz_) <= static_cast<size_t>(this->rows_) * static_cast<size_t>(this->columns_));

?

@julianlitz
Copy link
Copy Markdown
Collaborator

@julianlitz

Small comments improvements:

I agree with nearly all these changes, but why do you suggest removing an assertion:

assert(static_cast<size_t>(this->nnz_) <= static_cast<size_t>(this->rows_) * static_cast<size_t>(this->columns_));

?

Lots of annoying static cast just to prevent overflows.

@EmilyBourne
Copy link
Copy Markdown
Collaborator Author

@julianlitz I have created #250 to compare the amount of code required between the 2 methods. I am leaning towards #250 in the end. It changes ~100 lines less code. It is a shame that const no longer enforces constness but by only having an explicit setter it should be clear when the matrix is modified.

@julianlitz
Copy link
Copy Markdown
Collaborator

@EmilyBourne Sure go for 250 then

@EmilyBourne EmilyBourne closed this May 8, 2026
@EmilyBourne EmilyBourne deleted the ebourne_csr_to_gpu_compat branch May 8, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants