You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -38,6 +57,63 @@ standard library threads and has so far proven a more reliable version, the alte
38
57
for parallel iteration. To use **rayon** instead, build or install the program with the `--features rayon` flag.
39
58
40
59
60
+
## 📝 Schema setup
61
+
62
+
All available commands in *evolution* require an existing valid **schema**. A schema, in this context, is a [json](https://www.json.org/json-en.html)
63
+
file specifying the layout of the contents of a fixed-length file. Every schema used has to follow
64
+
[this](https://github.com/firelink-data/evolution/tree/main/resources/template-schema.json) template. If you are unsure whether or not your own schema
65
+
file is valid according to the template, you can use [this](https://www.jsonschemavalidator.net/) validator tool.
66
+
67
+
An example schema can be found [here](https://github.com/firelink-data/evolution/tree/main/resources/example-schema.json), and looks like this:
68
+
```
69
+
{
70
+
"name": "EvolutionExampleSchema",
71
+
"version": 1337,
72
+
"columns": [
73
+
{
74
+
"name": "id",
75
+
"offset": 0,
76
+
"length": 9,
77
+
"dtype": "i32",
78
+
"alignment": "Right",
79
+
"pad_symbol": "Zero",
80
+
"is_nullable": false
81
+
},
82
+
{
83
+
"name": "name",
84
+
"offset": 9,
85
+
"length": 32,
86
+
"dtype": "utf8",
87
+
"is_nullable": true
88
+
},
89
+
{
90
+
"name": "city",
91
+
"offset": 41,
92
+
"length": 32,
93
+
"dtype": "utf8",
94
+
"alignment": "Right",
95
+
"pad_symbol": "Backslash",
96
+
"is_nullable": false
97
+
},
98
+
{
99
+
"name": "employed",
100
+
"offset": 73,
101
+
"length": 5,
102
+
"dtype": "boolean",
103
+
"alignment": "Center",
104
+
"pad_symbol": "Asterisk",
105
+
"is_nullable": false
106
+
}
107
+
]
108
+
}
109
+
```
110
+
111
+
As specified in the template, all columns have to provide the following fields **(name, offset, length, is_nullable)**, whereas
112
+
**alignment** and **pad_symbol** can be omitted (as they are in this example for the *name* column). If they are not provided, they will assume their default values which are
113
+
"**Right**" and "**Whitespace**" respectively. These default values come from the [padder](https://github.com/firelink-data/padder) crate which defines the enums
114
+
`Alignment` and `Symbol`, with default implementations as `Alignment::Right` and `Symbol::Whitespace` respectively.
115
+
116
+
41
117
## 🚀 Example usage
42
118
43
119
If you build and/or install the program as explained above then by simply running the binary you will see the following:
@@ -57,63 +133,66 @@ Options:
57
133
-V, --version Print version
58
134
```
59
135
60
-
The functionality of the program is structured as two main commands: **mock** and **convert**.
136
+
As you can see from above, the functionality of the program comprises of the two main commands: **convert** and **mock**.
61
137
62
-
### 👨🎨 Mocking
138
+
139
+
### 🏗️👷♂️ Converting
63
140
64
141
```
65
-
Generate mocked fixed-length files (.flf) for testing purposes
Set the capacity of the thread channel (number of messages)
80
157
-h, --help
81
158
Print help
82
159
```
83
160
84
-
For example, if you wanted to mock 1 billion rows of a fixed-length file from a schema located at `./my/path/to/schema.json` with
85
-
the output name `mocked-data.flf`, you could run the following command:
161
+
To convert a fixed-length file called `really-big-data.flf`, with associated schema located at `./my/path/to/schema.json`, to a parquet file with name `smaller-data.parquet`, you could run the following command:
Set the capacity of the thread channel (number of messages)
108
185
-h, --help
109
186
Print help
110
187
```
111
188
112
-
To convert a fixed-length file called `really-big-data.flf`, with associated schema located at `./my/path/to/schema.json`, to a parquet file with name `smaller-data.parquet`, you could run the following command:
189
+
For example, if you wanted to mock 1 billion rows of a fixed-length file from a schema located at `./my/path/to/schema.json` with
190
+
the output name `mocked-data.flf`, you could run the following command:
Use the value found under **NumberOfLogicalProcessors**.
218
+
219
+
### Unix
131
220
```
221
+
lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
222
+
```
223
+
224
+
The number of logical cores is calculed as: **threads per core X cores per socket X sockets**.
225
+
132
226
133
-
## 📋 License
227
+
## 📜 License
134
228
All code is to be held under a general MIT license, please see [LICENSE](https://github.com/firelink-data/evolution/blob/main/LICENSE) for specific information.
0 commit comments