Skip to content

Commit a488de2

Browse files
committed
ready for CRAN
1 parent aa81270 commit a488de2

3 files changed

Lines changed: 34 additions & 4 deletions

File tree

NEWS.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,15 @@
1-
# filearray (development version)
1+
# filearray 0.1.1
2+
3+
* Added `OpenMP` flag in the `MakeVars`
4+
* Fixed critical bugs that could cause `segfaults`
5+
* Re-implemented read/write functions to use memory map
6+
* Allowed `dimnames` to be set
7+
* Added generics `subset` to subset using `dimnames`
8+
* Added vignette to compare performance
9+
* Added speed comparisons in `README.md`
10+
* Added `collapse` to calculate marginal summation with little memory overhead
11+
* Added `fmap`, `fmap2` to apply functions to one or multiple file arrays with little memory overhead (also very fast)
12+
* Fixed 'unprotected' issues warned by `rchk`
213

314
# filearray 0.1.0
415

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Stores large arrays in files to avoid occupying large memories. Implemented with
1111

1212
![](https://raw.githubusercontent.com/dipterix/filearray/main/adhoc/readme-speed.png)
1313

14-
<small> *Speed comparisons with `lazyarray` (`zstd`-compressed out-of-memory array), and in-memory operation. `filearray` is uniformly faster than `lazyarray`. Random access has almost the same speed as the native in-memory operation. The speed test was performed on an `MacBook Air (M1, 2020)` with 8GB memory* </small>
14+
<small> *Speed comparisons with `lazyarray` (`zstd`-compressed out-of-memory array), and in-memory operation. The speed test was conducted on an `MacBook Air (M1, 2020, 8GB RAM)`. `filearray` is uniformly faster than `lazyarray`. Random access has almost the same speed as the native in-memory operation.* </small>
1515

1616
## Installation
1717

@@ -150,4 +150,10 @@ hence use with caution when data needs high precision or the max is super large.
150150

151151
3. `collapse` function: when data range is large (say `x[[1]]=1`, but `x[[2]]=10^20`), `collapse` method might lose precision. This is `double` only uses 8 bytes of memory space. When calculating summations, R internally uses `long double` to prevent precision loss, but current `filearray` implementation uses `double`, causing floating error around 16 decimal place.
152152

153+
#### III. Cold-start vs warm-start
154+
155+
As of version `0.1.1`, most file read/write operations are switched from `fopen` to memory map for two simplify the logic (buffer size, kernel cache...), and to boost the writing/some types of reading speed. While sacrificing the speed of reading large block of data from 2.4GB/s to 1.7GB/s, the writing speed was boosted from 300MB/s to 700MB/s, and the speed of random accessing small slices of data was increased from 900MB/s to 2.5GB/s. As a result, some functions can reach to really high speed (close to in-memory calls) while using much less memory.
156+
157+
The additional performance improvements brought by the memory mapping approach might be impacted by "cold" start. When reading/writing files, most modern systems will cache the files so that it can load up these files faster next time. I personally call it a cold start. Memory mapping have a little bit extra overhead during the cold start, resulting in decreased performance (but it's still fast). Accessing the same data after the cold start is called warm start. When operating with warm starts, `filearray` is as fast as native R arrays (sometimes even faster due to the indexing method and fewer garbage collections). This means `filearray` reaches its best performance when the arrays are re-used.
158+
153159

cran-comments.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,19 @@
88

99
## R CMD check results
1010

11-
0 errors | 0 warnings | 1 note
11+
On R-4.X
12+
0 errors | 0 warnings | 0 notes
1213

13-
* This is a new release.
14+
On R-3.6
15+
0 errors | 1 warning | 0 notes
16+
17+
```
18+
Codoc mismatches from documentation object 'apply':
19+
apply
20+
Code: function(X, MARGIN, FUN, ...)
21+
Docs: function(X, MARGIN, FUN, ..., simplify = TRUE)
22+
Argument names in docs not in code:
23+
simplify
24+
```
25+
26+
This is because `simplify` was added to `apply` function since R-4.0.

0 commit comments

Comments
 (0)