@@ -38,7 +38,6 @@ https://github.com/firelink-data/evolution/edit/feat/single-threaded/README.md
3838* [ License] ( https://github.com/firelink-data/evolution#-license )
3939
4040
41-
4241## 📦 Installation
4342
4443The easiest way to install * evolution* on your system is by using the [ Cargo] ( https://crates.io/ ) package manager.
@@ -51,7 +50,6 @@ Alternatively, you can build from source by cloning this repo and compiling usin
5150git clone https://github.com/firelink-data/evolution.git
5251cd evolution
5352cargo build --release
54- <<<<<<< HEAD
5553```
5654
5755The program uses either of two different types of threading implementations. The default implementation uses the
@@ -227,106 +225,4 @@ The number of logical cores is calculed as: **threads per core X cores per socke
227225
228226
229227## 📜 License
230- =======
231- ```
232-
233- The program uses either of two different types of threading implementations. The default implementation uses the
234- standard library threads and has so far proven a more reliable version, the alternative is by using [rayon](https://docs.rs/rayon/latest/rayon/)
235- for parallel iteration. To use **rayon** instead, build or install the program with the `--features rayon` flag.
236-
237-
238- ## 🚀 Example usage
239-
240- If you build and/or install the program as explained above then by simply running the binary you will see the following:
241- ```
242- 🦖 Evolve your fixed-length data files into Apache Arrow tables, fully parallelized!
243-
244- Usage: evolution [ OPTIONS] <COMMAND >
245-
246- Commands:
247- convert Convert a fixed-length file (.flf) to parquet
248- mock Generate mocked fixed-length files (.flf) for testing purposes
249- help Print this message or the help of the given subcommand(s)
250-
251- Options:
252- --n-threads <NUM-THREADS > Set the number of threads (logical cores) to use when multi-threading [ default: 1]
253- -h, --help Print help
254- -V, --version Print version
255- ```
256-
257- The functionality of the program is structured as two main commands: **mock** and **convert**.
258-
259- ### 👨🎨 Mocking
260-
261- ```
262- Generate mocked fixed-length files (.flf) for testing purposes
263-
264- Usage: evolution mock [ OPTIONS] --schema <SCHEMA >
265-
266- Options:
267- -s, --schema <SCHEMA >
268- Specify the .json schema file to mock data for
269- -o, --output-file <OUTPUT-FILE >
270- Specify output (target) file name
271- -n, --n-rows <NUM-ROWS >
272- Set the number of rows to generate [ default: 100]
273- --buffer-size <BUFFER-SIZE >
274- Set the size of the buffer (number of rows)
275- --thread-channel-capacity <THREAD-CHANNEL-CAPACITY >
276- Set the capacity of the thread channel (number of messages)
277- -h, --help
278- Print help
279- ```
280-
281- For example, if you wanted to mock 1 billion rows of a fixed-length file from a schema located at `./my/path/to/schema.json` with
282- the output name `mocked-data.flf`, you could run the following command:
283- ```
284- evolution mock --schema ./my/schema/path/schema.json --output-file mocked-data.flf --n-rows 1000000000
285- ```
286-
287- ### 🏗️👷♂️ Converting
288-
289- ```
290- Convert a fixed-length file (.flf) to parquet
291-
292- Usage: evolution convert [ OPTIONS] --file <FILE > --schema <SCHEMA >
293-
294- Options:
295- -f, --file <FILE >
296- The fixed-length file to convert
297- -o, --output-file <OUTPUT-FILE >
298- Specify output (target) file name
299- -s, --schema <SCHEMA >
300- Specify the .json schema file to use when converting
301- --buffer-size <BUFFER-SIZE >
302- Set the size of the buffer (in bytes)
303- --thread-channel-capacity <THREAD-CHANNEL-CAPACITY >
304- Set the capacity of the thread channel (number of messages)
305- -h, --help
306- Print help
307- ```
308-
309- To convert a fixed-length file called `really-big-data.flf`, with associated schema located at `./my/path/to/schema.json`, to a parquet file with name `smaller-data.parquet`, you could run the following command:
310- ```
311- evolution convert --file really-big-data.flf --output-file smaller-data.parquet --schema ./my/path/to/schema.json
312- ```
313-
314- ### 🧵 Threading
315-
316- There exists a global setting for the program called `--n-threads` which dictates whether or not the invoked command will be executed
317- in single- or multithreaded mode. This argument should be a number representing the number of threads (logical cores) that you want
318- to use. If you try and set a larger number of threads than you system has logical cores, then the program will use **all available
319- logical cores**. If this argument is omitted, then the program will run in single-threaded mode.
320-
321- **Note that running multithreaded only really has any clear increase in performance for substantially large workloads.**
322-
323- ### 🧵 Converting multithreaded
324- An experimental multithreaded implementation exists , it reads chunks of 2 megabytes and splits them into n anmounts of cores in O(1).
325- Run a small conversion test using the "arrow" converter with slicer type "chunked"
326- ```
327- $ cargo run --package evolution --release --bin evolution -- c-convert --schema resources/schema/test_schema.json --in-file resources/schema/test_schema_mock.txt --out-file out.parquet arrow chunks
328- ```
329-
330- ## 📋 License
331- >>>>>>> main
332228All code is to be held under a general MIT license, please see [ LICENSE] ( https://github.com/firelink-data/evolution/blob/main/LICENSE ) for specific information.
0 commit comments