Benchmark `xan count --parallel` and `xan parallel cat -P 'select'`

Hi,

Thanks for the interesting benchmark. It seems it matches what I could observe on my own. Still I think you could add two things to it to have a broader picture, as `xan` also knows how to parallelize its computations:

* You can try `xan count --parallel`, for the count bench
* You can try `xan parallel cat -P 'select <columns>'` for the select bench

It seems `zsv` does not know how to parallelize unless the file is on disk so I suspect its approach is similar to the one used by `xan`, i.e. to chunk the file cleverly in constant time ahead of a map-reduce-like process (here is how `xan` does it, in any case: https://github.com/medialab/xan/blob/master/docs/blog/csv_base_jumping.md)?

I am curious to know if you ever attempted to leverage `avx512`. A lot of people are touting it can be even faster but I am skeptical. `xan` SIMD parser is not branchless in any case so I doubt it would give it an edge just yet, but it might for yours?

Another thing that could be good to bench also is a command that requires unquoting (`xan count` uses a parser that does not even attempt to separate cells, and `xan select` uses a zero-copy, non-unquoting parser as it is not required to shuffle columns around). I suspect your parser would be faster there also.

Else you write in your README:

> non-4180-compliant data: zsv is fastest across the board (xan and polars are N/A for this input category)

`xan` can deal with such data also, by using a typical non-SIMD parser exposed by the `xan input` command, if you need to.

Best,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark `xan count --parallel` and `xan parallel cat -P 'select'` #568

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark xan count --parallel and xan parallel cat -P 'select' #568

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Benchmark `xan count --parallel` and `xan parallel cat -P 'select'` #568