Skip to content

Commit 4c308bf

Browse files
JAi-SATHVIKjvdp1
andauthored
Implement Principal Component Analysis (PCA) module (#1086)
* add PCA to public api * include pca submodule * Add PCA module with `pca`, `pca_transform`, and `pca_inverse_transform` routines. * add PCA unit test * update end interface statement * update CmakeLists * fixed_conflicts * update interface * allined with the other linalg function * convert to subroutines,updated test * fix errors * fixed errors * fix PCA BLAS/LAPACK linking * fix PCA BLAS/LAPACK * fix: remove xdp/qp from PCA use statements to fix CI builds * both updated * test * modify interfaces for core. * add stdlib_sorting.fypp in cmakelists.txt * Fix CMakeLists.txt for the addition of stdlib_storting_pca * Add center_data Helper Subroutine * Replace Manual Mean with stdlib mean * Replace Covariance Loops with BLAS syrk * Extract pca_svd_driver and pca_eigh_driver & Updated Main pca Subroutine * optimized for performance and stability * Cache efficency * fix issues build issues. * Revert "fix issues build issues." This reverts commit 7348faf. * use nested do loops * resolve compiler errors * fix issue * add PCA to public api * include pca submodule * Add PCA module with `pca`, `pca_transform`, and `pca_inverse_transform` routines. * add PCA unit test * update end interface statement * update CmakeLists * fixed_conflicts * update interface * allined with the other linalg function * convert to subroutines,updated test * fix errors * fixed errors * fix PCA BLAS/LAPACK linking * fix PCA BLAS/LAPACK * fix: remove xdp/qp from PCA use statements to fix CI builds * both updated * test * modify interfaces for core. * add stdlib_sorting.fypp in cmakelists.txt * Fix CMakeLists.txt for the addition of stdlib_storting_pca * Add center_data Helper Subroutine * Replace Manual Mean with stdlib mean * Replace Covariance Loops with BLAS syrk * Extract pca_svd_driver and pca_eigh_driver & Updated Main pca Subroutine * optimized for performance and stability * Cache efficency * fix issues build issues. * Revert "fix issues build issues." This reverts commit 7348faf. * use nested do loops * resolve compiler errors * fix issue * remove unused BLAS constants to prevent compiler warnings * remove unused import * remove unused output arrays * remove unused output arrays * fix: replace string concatenation with comma args to fix ifx crash * Use REAL_KINDS_TYPES * Change singular_values * Remove scale_factor variable * fix issues * refactor * remove sort index,lower triangle fill. * Fix eigh: use upper_a instead of lower * update center data subroutine * remove elsewhere clause * update specs * fix * , * update test file * loop fix * update documentation * add checks * add test * add keyword-based transform * fix interface declarations * reorder arguments * update docs and tests * Add explicit shape validation and nc < 1 checks to PCA subroutines --------- Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
1 parent c016c6d commit 4c308bf

7 files changed

Lines changed: 637 additions & 1 deletion

File tree

doc/specs/stdlib_stats.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,110 @@ If `mask` is specified, the result is the _k_-th (central) moment of all elemen
283283
{!example/stats/example_moment.f90!}
284284
```
285285

286+
## `pca` - Principal Component Analysis
287+
288+
### Status
289+
290+
Experimental
291+
292+
### Description
293+
294+
Performs Principal Component Analysis (PCA) on a 2D array of observations and features.
295+
The input matrix `x` has shape `(m, n)`, where `m` is the number of observations and `n` is the number of features.
296+
The subroutine computes the principal components (loadings), the singular values, and optionally the feature means.
297+
298+
Two methods are supported:
299+
- `"svd"`: (Default) Computes PCA via Singular Value Decomposition of the centered data. This is generally more numerically stable.
300+
- `"eig"` or `"cov"`: Computes PCA via Eigendecomposition of the covariance matrix.
301+
302+
### Syntax
303+
304+
`call ` [[stdlib_stats(module):pca(interface)]] `(x, components, singular_values [, x_mean, method, overwrite_x, err])`
305+
306+
### Class
307+
308+
Generic subroutine
309+
310+
### Arguments
311+
312+
`x`: Shall be a rank-2 real array with shape `(m, n)`. It is an `intent(inout)` argument. If `overwrite_x` is `.true.`, `x` may be modified during computation.
313+
314+
`components`: Shall be a rank-2 real array with shape `(n_components, n)`. It stores the principal components as rows. It is an `intent(out)` argument.
315+
316+
`singular_values`: Shall be a rank-1 real array with shape `(n_components)`. It stores the singular values in descending order. It is an `intent(out)` argument.
317+
318+
`x_mean` (optional): Shall be a rank-1 real array with shape `(n)`. It stores the mean of each feature (column). It is an `intent(out)` argument.
319+
320+
`method` (optional): Shall be a character string. Either `"svd"` or `"eig"`/`"cov"`. It is an `intent(in)` argument.
321+
322+
`overwrite_x` (optional): Shall be a scalar of type `logical`. If `.true.`, the input matrix `x` can be used as a workspace and modified. It is an `intent(in)` argument.
323+
324+
`err` (optional): Shall be of type `linalg_state_type`. It is an `intent(out)` argument.
325+
326+
### Example
327+
328+
```fortran
329+
{!example/stats/example_pca.f90!}
330+
```
331+
332+
## `pca_transform` - Projects data into principal component space
333+
334+
### Status
335+
336+
Experimental
337+
338+
### Description
339+
340+
Projects the input data `x` into the reduced dimensional space defined by the provided principal components.
341+
The transformation is defined as `x_transformed = (x - x_mean) * components^T`.
342+
343+
### Syntax
344+
345+
`call ` [[stdlib_stats(module):pca_transform(interface)]] `(x, components, x_transformed [, x_mean])`
346+
347+
### Class
348+
349+
Generic subroutine
350+
351+
### Arguments
352+
353+
`x`: Shall be a rank-2 real array with shape `(m, n)`. It is an `intent(in)` argument.
354+
355+
`components`: Shall be a rank-2 real array with shape `(nc, n)`. It stores the principal components as rows. It is an `intent(in)` argument.
356+
357+
`x_transformed`: Shall be a rank-2 real array with shape `(m, nc)`. It stores the projected data. It is an `intent(out)` argument.
358+
359+
`x_mean` (optional): Shall be a rank-1 real array with shape `(n)`. It stores the feature means to subtract. It is an `intent(in)` argument.
360+
361+
## `pca_inverse_transform` - Reconstructs original data from principal component space
362+
363+
### Status
364+
365+
Experimental
366+
367+
### Description
368+
369+
Reconstructs the original data representation from the principal component space.
370+
The reconstruction is defined as `x_reconstructed = x_reduced * components + x_mean`.
371+
372+
### Syntax
373+
374+
`call ` [[stdlib_stats(module):pca_inverse_transform(interface)]] `(x_reduced, components, x_reconstructed [, x_mean])`
375+
376+
### Class
377+
378+
Generic subroutine
379+
380+
### Arguments
381+
382+
`x_reduced`: Shall be a rank-2 real array with shape `(m, nc)`. It is an `intent(in)` argument.
383+
384+
`components`: Shall be a rank-2 real array with shape `(nc, n)`. It stores the principal components as rows. It is an `intent(in)` argument.
385+
386+
`x_reconstructed`: Shall be a rank-2 real array with shape `(m, n)`. It stores the reconstructed data. It is an `intent(out)` argument.
387+
388+
`x_mean` (optional): Shall be a rank-1 real array with shape `(n)`. It stores the feature means to add back. It is an `intent(in)` argument.
389+
286390
## `var` - variance of array elements
287391

288392
### Status

example/stats/example_pca.f90

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
program example_pca
2+
use stdlib_kinds, only: dp
3+
use stdlib_stats, only: pca, pca_transform, pca_inverse_transform
4+
use stdlib_linalg_state, only: linalg_state_type
5+
implicit none
6+
7+
real(dp) :: x(3, 2), components(2, 2), s(2), mu(2)
8+
real(dp) :: x_trans(3, 2), x_inv(3, 2)
9+
type(linalg_state_type) :: err
10+
integer :: i
11+
12+
! Input data: 3 observations, 2 features
13+
x = reshape([1.0_dp, 3.0_dp, 5.0_dp, 2.0_dp, 4.0_dp, 6.0_dp], [3, 2])
14+
15+
print *, "Original data:"
16+
do i = 1, 3
17+
print "(2f6.2)", x(i, :)
18+
end do
19+
20+
! Perform PCA
21+
call pca(x, components, s, x_mean=mu, err=err)
22+
23+
if (err%ok()) then
24+
print *, ""
25+
print *, "Feature means:", mu
26+
print *, "Singular values:", s
27+
print *, "Principal components (as rows):"
28+
print "(2f6.3)", components(1, :)
29+
print "(2f6.3)", components(2, :)
30+
31+
! Transform data to principal components space
32+
call pca_transform(x, components, x_trans, mu)
33+
print *, ""
34+
print *, "Transformed data (projected):"
35+
do i = 1, 3
36+
print "(2f8.3)", x_trans(i, :)
37+
end do
38+
39+
! Inverse transform to reconstruct original data
40+
call pca_inverse_transform(x_trans, components, x_inv, mu)
41+
print *, ""
42+
print *, "Reconstructed data:"
43+
do i = 1, 3
44+
print "(2f6.2)", x_inv(i, :)
45+
end do
46+
else
47+
print *, "PCA failed: ", err%message
48+
end if
49+
50+
end program example_pca

src/stats/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ set(stats_fppFiles
55
stdlib_random.fypp
66
stdlib_stats_corr.fypp
77
stdlib_stats_cov.fypp
8+
stdlib_stats_pca.fypp
89
stdlib_stats_distribution_exponential.fypp
910
stdlib_stats_distribution_gamma.fypp
1011
stdlib_stats_distribution_normal.fypp

src/stats/stdlib_stats.fypp

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,12 @@ module stdlib_stats
99
!! ([Specification](../page/specs/stdlib_stats.html))
1010
use stdlib_kinds, only: sp, dp, xdp, qp, &
1111
int8, int16, int32, int64
12+
use stdlib_linalg_state, only: linalg_state_type
1213
implicit none
1314
private
1415
! Public API
1516
public :: corr, cov, mean, median, moment, var
17+
public :: pca, pca_transform, pca_inverse_transform
1618

1719

1820
interface corr
@@ -639,4 +641,57 @@ module stdlib_stats
639641
#:endfor
640642
end interface moment
641643

644+
645+
interface pca
646+
!! version: experimental
647+
!!
648+
!! Principal Component Analysis (PCA)
649+
!! ([Specification](../page/specs/stdlib_stats.html#pca))
650+
#:for k1, t1, ri, cpp in REAL_KINDS_TYPES
651+
module subroutine pca_${k1}$(x, components, singular_values, x_mean, &
652+
method, overwrite_x, err)
653+
${t1}$, intent(inout) :: x(:,:)
654+
${t1}$, intent(out) :: components(:,:)
655+
${t1}$, intent(out) :: singular_values(:)
656+
${t1}$, intent(out), optional :: x_mean(:)
657+
character(*), intent(in), optional :: method
658+
logical, intent(in), optional :: overwrite_x
659+
type(linalg_state_type), intent(out), optional :: err
660+
end subroutine pca_${k1}$
661+
#:endfor
662+
end interface pca
663+
664+
665+
interface pca_transform
666+
!! version: experimental
667+
!!
668+
!! Projects data into the reduced dimensional space
669+
!! ([Specification](../page/specs/stdlib_stats.html#pca_transform))
670+
#:for k1, t1, ri, cpp in REAL_KINDS_TYPES
671+
module subroutine pca_transform_${k1}$(x, components, x_transformed, x_mean)
672+
${t1}$, intent(in) :: x(:,:)
673+
${t1}$, intent(in) :: components(:,:)
674+
${t1}$, intent(out) :: x_transformed(:,:)
675+
${t1}$, intent(in), optional :: x_mean(:)
676+
end subroutine pca_transform_${k1}$
677+
#:endfor
678+
end interface pca_transform
679+
680+
681+
interface pca_inverse_transform
682+
!! version: experimental
683+
!!
684+
!! Reconstructs original data from the reduced space
685+
!! ([Specification](../page/specs/stdlib_stats.html#pca_inverse_transform))
686+
#:for k1, t1, ri, cpp in REAL_KINDS_TYPES
687+
module subroutine pca_inverse_transform_${k1}$(x_reduced, components, x_reconstructed, x_mean)
688+
${t1}$, intent(in) :: x_reduced(:,:)
689+
${t1}$, intent(in) :: components(:,:)
690+
${t1}$, intent(out) :: x_reconstructed(:,:)
691+
${t1}$, intent(in), optional :: x_mean(:)
692+
end subroutine pca_inverse_transform_${k1}$
693+
#:endfor
694+
end interface pca_inverse_transform
695+
696+
642697
end module stdlib_stats

0 commit comments

Comments
 (0)