Skip to content

Commit 3df0a8a

Browse files
committed
[chore] fix merge conflicts
1 parent 872fef8 commit 3df0a8a

2 files changed

Lines changed: 1 addition & 140 deletions

File tree

Cargo.toml

Lines changed: 1 addition & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,12 @@
11
[package]
22
name = "evolution"
3-
<<<<<<< HEAD
43
version = "1.0.0"
5-
=======
6-
version = "0.3.5"
7-
>>>>>>> main
84
edition = "2021"
95
description = "🦖 Evolve your fixed-length data files into Apache Arrow tables, fully parallelized!"
106
authors = [
117
"Ted Hammarlund <TedHammarlund@gmail.com>",
128
"Rickard Lundin <rickard@x14.se>",
139
"Wilhelm Ågren <wilhelmagren98@gmail.com>",
14-
"Ted Hammarlund <TedHammarlund@gmail.com>",
15-
"Rickard Lundin <rickard@x14.se>",
1610
]
1711

1812
readme = "README.md"
@@ -29,11 +23,8 @@ keywords = [
2923
]
3024
include = [ "**/*.rs", "Cargo.toml", "LICENSE", "README.md" ]
3125
default-run = "evolution"
32-
# This is nom nom nom for SIMD
33-
3426

3527
[dependencies]
36-
<<<<<<< HEAD
3728
chrono = "0.4.31"
3829
crossbeam = "0.8.2"
3930
colored = "2.0.4"
@@ -52,32 +43,6 @@ arrow2 = "0.18.0"
5243
libc = "0.2.154"
5344
arrow = "51.0.0"
5445
parquet = "51.0.0"
55-
=======
56-
arrow2 = { version = "0.18.0", features = ["io_ipc"] }
57-
arrow = { version = "51.0.0", features = ["ipc"] }
58-
debug_print = "1.0.0"
59-
arrow-format = "0.8.1"
60-
arrow-schema = "51.0.0"
61-
arrow-array = "51.0.0"
62-
parquet = "51.0.0"
63-
atoi_simd = "0.15.6"
64-
chrono = "0.4.38"
65-
clap = { version = "4.5.4", features = ["default", "derive"] }
66-
crossbeam = "0.8.4"
67-
colored = "2.1.0"
68-
env_logger = "0.11.3"
69-
half = "2.4.1"
70-
log = "0.4.21"
71-
num_cpus = "1.16.0"
72-
rand = { version = "0.8.5" }
73-
rayon = { version = "1.10.0" }
74-
serde = { version = "1.0.201", features = ["derive"] }
75-
serde_json = "1.0.117"
76-
threadpool = "1.8.1"
77-
substring = "1.4.5"
78-
tempfile = "3.10.1"
79-
libc = "0.2.154"
80-
>>>>>>> main
8146
padder = { version = "1.2.0", features = ["serde"] }
8247

8348
[dev-dependencies]
@@ -86,4 +51,4 @@ glob = "0.3.1"
8651
[features]
8752
default = []
8853
rayon = [ "dep:rayon", "dep:atoi_simd" ]
89-
nightly = []
54+
nightly = []

README.md

Lines changed: 0 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ https://github.com/firelink-data/evolution/edit/feat/single-threaded/README.md
3838
* [License](https://github.com/firelink-data/evolution#-license)
3939

4040

41-
4241
## 📦 Installation
4342

4443
The easiest way to install *evolution* on your system is by using the [Cargo](https://crates.io/) package manager.
@@ -51,7 +50,6 @@ Alternatively, you can build from source by cloning this repo and compiling usin
5150
git clone https://github.com/firelink-data/evolution.git
5251
cd evolution
5352
cargo build --release
54-
<<<<<<< HEAD
5553
```
5654

5755
The program uses either of two different types of threading implementations. The default implementation uses the
@@ -227,106 +225,4 @@ The number of logical cores is calculed as: **threads per core X cores per socke
227225

228226

229227
## 📜 License
230-
=======
231-
```
232-
233-
The program uses either of two different types of threading implementations. The default implementation uses the
234-
standard library threads and has so far proven a more reliable version, the alternative is by using [rayon](https://docs.rs/rayon/latest/rayon/)
235-
for parallel iteration. To use **rayon** instead, build or install the program with the `--features rayon` flag.
236-
237-
238-
## 🚀 Example usage
239-
240-
If you build and/or install the program as explained above then by simply running the binary you will see the following:
241-
```
242-
🦖 Evolve your fixed-length data files into Apache Arrow tables, fully parallelized!
243-
244-
Usage: evolution [OPTIONS] <COMMAND>
245-
246-
Commands:
247-
convert Convert a fixed-length file (.flf) to parquet
248-
mock Generate mocked fixed-length files (.flf) for testing purposes
249-
help Print this message or the help of the given subcommand(s)
250-
251-
Options:
252-
--n-threads <NUM-THREADS> Set the number of threads (logical cores) to use when multi-threading [default: 1]
253-
-h, --help Print help
254-
-V, --version Print version
255-
```
256-
257-
The functionality of the program is structured as two main commands: **mock** and **convert**.
258-
259-
### 👨‍🎨 Mocking
260-
261-
```
262-
Generate mocked fixed-length files (.flf) for testing purposes
263-
264-
Usage: evolution mock [OPTIONS] --schema <SCHEMA>
265-
266-
Options:
267-
-s, --schema <SCHEMA>
268-
Specify the .json schema file to mock data for
269-
-o, --output-file <OUTPUT-FILE>
270-
Specify output (target) file name
271-
-n, --n-rows <NUM-ROWS>
272-
Set the number of rows to generate [default: 100]
273-
--buffer-size <BUFFER-SIZE>
274-
Set the size of the buffer (number of rows)
275-
--thread-channel-capacity <THREAD-CHANNEL-CAPACITY>
276-
Set the capacity of the thread channel (number of messages)
277-
-h, --help
278-
Print help
279-
```
280-
281-
For example, if you wanted to mock 1 billion rows of a fixed-length file from a schema located at `./my/path/to/schema.json` with
282-
the output name `mocked-data.flf`, you could run the following command:
283-
```
284-
evolution mock --schema ./my/schema/path/schema.json --output-file mocked-data.flf --n-rows 1000000000
285-
```
286-
287-
### 🏗️👷‍♂️ Converting
288-
289-
```
290-
Convert a fixed-length file (.flf) to parquet
291-
292-
Usage: evolution convert [OPTIONS] --file <FILE> --schema <SCHEMA>
293-
294-
Options:
295-
-f, --file <FILE>
296-
The fixed-length file to convert
297-
-o, --output-file <OUTPUT-FILE>
298-
Specify output (target) file name
299-
-s, --schema <SCHEMA>
300-
Specify the .json schema file to use when converting
301-
--buffer-size <BUFFER-SIZE>
302-
Set the size of the buffer (in bytes)
303-
--thread-channel-capacity <THREAD-CHANNEL-CAPACITY>
304-
Set the capacity of the thread channel (number of messages)
305-
-h, --help
306-
Print help
307-
```
308-
309-
To convert a fixed-length file called `really-big-data.flf`, with associated schema located at `./my/path/to/schema.json`, to a parquet file with name `smaller-data.parquet`, you could run the following command:
310-
```
311-
evolution convert --file really-big-data.flf --output-file smaller-data.parquet --schema ./my/path/to/schema.json
312-
```
313-
314-
### 🧵 Threading
315-
316-
There exists a global setting for the program called `--n-threads` which dictates whether or not the invoked command will be executed
317-
in single- or multithreaded mode. This argument should be a number representing the number of threads (logical cores) that you want
318-
to use. If you try and set a larger number of threads than you system has logical cores, then the program will use **all available
319-
logical cores**. If this argument is omitted, then the program will run in single-threaded mode.
320-
321-
**Note that running multithreaded only really has any clear increase in performance for substantially large workloads.**
322-
323-
### 🧵 Converting multithreaded
324-
An experimental multithreaded implementation exists , it reads chunks of 2 megabytes and splits them into n anmounts of cores in O(1).
325-
Run a small conversion test using the "arrow" converter with slicer type "chunked"
326-
```
327-
$ cargo run --package evolution --release --bin evolution -- c-convert --schema resources/schema/test_schema.json --in-file resources/schema/test_schema_mock.txt --out-file out.parquet arrow chunks
328-
```
329-
330-
## 📋 License
331-
>>>>>>> main
332228
All code is to be held under a general MIT license, please see [LICENSE](https://github.com/firelink-data/evolution/blob/main/LICENSE) for specific information.

0 commit comments

Comments
 (0)