Skip to content

Commit d360527

Browse files
authored
add biochemistry example (#553)
* work on biochemistry example * apply formatting changes --------- Co-authored-by: Logende <Logende@users.noreply.github.com>
1 parent 7ce3041 commit d360527

29 files changed

Lines changed: 969 additions & 11 deletions

examples/biochemistry/README.md

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
# Biochemistry Example
2+
3+
This hands-on example demonstrates how to:
4+
5+
1. Import a table from Excel/CSV into MetaConfigurator
6+
2. Automatically infer the data model and generate a schema
7+
3. Use the MetaConfigurator UI to make changes to the schema
8+
4. Export the schema and data to a JSON file
9+
5. Profit from having the data and schema in a machine-readable format by applying a simple Python script on it, enriching the synthesis data with additional metadata from the PubChem database
10+
11+
The example is based on a dataset of metal-organic framework (MOF) synthesis data.
12+
MetaConfigurator itself, however, is a generic tool and can be used for any kind of data of any domain.
13+
14+
The goal is to demonstrate how data from Excel/CSV can be turned machine-readable using MetaConfigurator.
15+
Also, it is shown how a data model (schema) can be created, which will helpful to communicate the structure of the data to others.
16+
Finally, we show that having the data in this machine-readable format allows for easy integration with other tools and services, such as the PubChem database.
17+
This applies for any kind of data: once it is in a machine-readable format, any tooling, programming language and also machine learning can be applied on it.
18+
19+
## Step 1: Import the data
20+
21+
Download the [ec-mof-synthesis.csv](ec-mof-synthesis.csv) file.
22+
Note that the data was originally in an Excel file and was exported to the CSV format already.
23+
24+
Open MetaConfigurator and click on the "Import Data..." button (not to be confused with the "Open Data" button).
25+
26+
![Import Data](figs/import_data.png)
27+
28+
Select the "Import CSV Data" option and choose the `ec-mof-synthesis.csv` file.
29+
30+
![Import CSV Data](figs/import_csv_data.png)
31+
32+
![Select CSV Document](figs/select_csv_document.png)
33+
34+
Keep the "Independent Table" and "Infer and generate schema for the data" options selected.
35+
36+
![Import Options](figs/import_options.png)
37+
38+
Press the "Import" button to import the data and generate the schema.
39+
40+
![Import Button](figs/import_button.png)
41+
42+
Now the data is successfully imported and the schema is generated.
43+
44+
![Imported Data](figs/imported_data.png)
45+
46+
47+
## Step 2: Making changes to the schema
48+
49+
The schema generated by MetaConfigurator can be further refined.
50+
Notice, that the `phase_purity` attribute is inferred as a string, but it should be a boolean (True/False) value.
51+
52+
First, let's navigate to the Schema Editor tab, by clicking the "Data Editor" button on the top left and then selecting the "Schema Editor" tab.
53+
54+
![Schema Editor](figs/change_to_schema_editor.png)
55+
56+
This tab has different available views.
57+
The text view shows the schema in its raw text form in JSON format.
58+
It is best to be used by advanced users who are familiar with the JSON schema format.
59+
The GUI view is more user-friendly and assists the user by showing all options available for defining the schema.
60+
The most easy and simple view is the diagram view, which shows the schema in a graphical form.
61+
Some more advanced schema options, such as conditionals and composition, can not be achieved in the diagram view and require the GUI or text view.
62+
For our example, the diagram view is sufficient.
63+
Hence, let's hide the other views and open only the diagram view.
64+
This can be done using the top toolbar.
65+
66+
![Schema View](figs/schema_view_1.png)
67+
68+
Click on the buttons to hide the text editor and the GUI editor.
69+
Afterward, only the diagram view should be visible.
70+
71+
![Schema View with only diagram](figs/schema_view_2.png)
72+
73+
In the diagram, click on the `phase_purity` attribute to edit it.
74+
Then, change the type to "boolean".
75+
76+
![Changing attribute type](figs/changing_attribute_type.png)
77+
78+
In the top menu bar, ckick the "Show Preview of resulting GUI" button to see how the schema will look like in the GUI view.
79+
80+
![Show Preview](figs/show_preview.png)
81+
82+
In the GUI view, the `phase_purity` attribute is now represented as a checkbox instead of a text field.
83+
Todo: auto-convert yes and no values?
84+
85+
![Result of attribute type change](figs/changing_attribute_type_result.png)
86+
87+
In our data, we also notice that `metal_salt_mass_unit` and `linker_mass_unit` both seem to be the same unit.
88+
Rather than allowing an arbitrary string for these fields, we can define a common unit type, which is used by both attributes.
89+
This can be done by adding a new enum to the schema, by clicking on the "Add Enum" button in the diagram view.
90+
91+
![Add Enum](figs/add_enum.png)
92+
93+
A new dummy enumeration will be added to the schema.
94+
Change the name of the enumeration to `mass_unit` and add the possible values `kg`, `g` and `mg`.
95+
96+
![Mass Unit Enum](figs/mass_unit_enum.png)
97+
98+
Now, change the type of the `metal_salt_mass_unit` and `linker_mass_unit` attributes to the newly defined `mass_unit` enumeration.
99+
100+
![Change attribute type to enum](figs/change_attribute_type_to_enum.png)
101+
102+
This will automatically create new edges from the attributes to the enumeration in the diagram.
103+
Notice, that in the GUI preview, the `metal_salt_mass_unit` and `linker_mass_unit` attributes are now represented as dropdowns with the possible values.
104+
105+
![Result of attribute type change to enum](figs/change_attribute_type_to_enum_result.png)
106+
107+
In the same manner, create a new enum `time_unit` with the values `s`, `min`, `h`, `day` and `week`.
108+
To the `time_unit` attribute, change the type to the new enumeration.
109+
110+
Also do the same for the `temperature_unit` attribute, with the values `K`, `deg C` and `deg F`.
111+
112+
The resulting schema is provided in the [ecmofsynthesis.schema.json](ecmofsynthesis.schema.json) file.
113+
The data is provided in the [ecmofsynthesis.json](ecmofsynthesis.json) file.
114+
115+
![Schema Result 1](figs/schema_result_1.png)
116+
117+
## Step 3: Enriching the data with additional metadata
118+
119+
Having this initial data model (schema) and data, now we want to apply a Python script on the data, which extends all the compounds (metal salt and linker) with additional metadata from the PubChem database:
120+
- Inchi Code
121+
- Smiles Code
122+
- Molecular Weight
123+
- cid (PubChem Compound ID)
124+
125+
Let's first adapt our data model and add the new attributes to the schema.
126+
Because the new attributes apply to both the `metal_salt` and `linker` compounds, we define a new schema object `compound`.
127+
128+
Click on the 'Add Object' button in the diagram view to add a new object to the schema.
129+
Change the name of the object to `compound`.
130+
Add the following attributes:
131+
- `inchi_code` (string)
132+
- `smiles_code` (string)
133+
- `molecular_weight` (number)
134+
- `cid` (number)
135+
136+
![Compound Object Node](figs/compound_node.png)
137+
138+
Now, we can introduce a new property `metal_salt` of type `compound` to the schema.
139+
Let's do the same for the `linker` property.
140+
141+
![Schema Result 2](figs/schema_result_2.png)
142+
143+
The resulting schema is provided in the [ecmofsynthesis_enriched.schema.json](ecmofsynthesis_enriched.schema.json) file.
144+
145+
Note that the design of the data model itself is up to preference and use case.
146+
In this example, we added a new object `compound` to the schema, which is used by both the `metal_salt` and `linker` properties.
147+
We did NOT CHANGE any existing properties, but only added new ones.
148+
It would also be a valid choice to move the `metal_salt_mass_unit`, `metal_salt_mass`, `linker_mass_unit` and `linker_mass` attributes to the `compound` object.
149+
The same applies for the `linker_name` and `metal_salt_name` properties.
150+
They could also be fully removed, as the `cid` attribute will be used to identify the compounds (exception: if no corresponding molecule is found in PubChem, then information would be lost).
151+
152+
Now, let's write a Python script to enrich the data with the additional metadata.
153+
The script is provided in the [enrich_data.py](enrich_data.py) file.
154+
The resulting JSON file is provided in the [ecmofsynthesis_enriched.json](ecmofsynthesis_enriched.json) file.
155+
156+
If we load the enriched data into MetaConfigurator (using the 'Import Data' button), we can see that the `metal_salt` and `linker` properties now have the additional metadata.
157+
158+
![Enriched Data](figs/enriched_data.png)
159+
160+
## Takeaways
161+
162+
This example demonstrated how to import data from Excel/CSV into MetaConfigurator, generate a schema and make changes to the schema.
163+
It also showed how to enrich the data with additional metadata from the PubChem database using a Python script.
164+
165+
By having the data in a machine-readable format, it is easy to apply additional tools and services on it.
166+
This can be useful for data integration, data analysis, data visualization, machine learning and many other applications.
167+
MetaConfigurator is a generic tool and can be used for any kind of data of any domain.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
Vial No,Date,Metal salt name,Metal salt mass,Metal salt mass unit,Linker name,Linker mass,Linker mass unit,Solvent,Temperature ,Temperature Unit,Time ,Time Unit,Place,phase purity
2+
S-1,01.02.2024,FeCl3 ,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,60,deg C,1,day,oven,yes
3+
S-2,01.02.2024,FeCl3,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,80,deg C,1,day,oven,yes
4+
S-3,01.02.2024,FeCl3,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,100,deg C,1,day,oven,yes
5+
S-4,01.02.2024,FeCl3,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,Dry DMF,120,deg C,1,day,oven,yes
6+
S-5,01.02.2024,FeCl3,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,120,deg C,1,day,oven,yes
7+
S-6,01.02.2024,FeCl3,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,120,deg C,1,hours,microwave,yes
8+
S-7,01.02.2024,FeCl3,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,140,deg C,2,hours,microwave,yes
9+
S-8,01.02.2024,FeCl3,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,"DMF, NaOH",120,deg C,1,day,oven,no
10+
S-9,01.02.2024,FeCl3,16,mg,"Benzene-1,4-dicarboxylic acid",16,mg,"DMF, HCI",120,deg C,1,day,oven,no
11+
S-10,01.02.2024,FeCl3·6H2O,27,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,60,deg C,1,day,oven,no
12+
S-11,01.02.2024,FeCl3·6H2O,27,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,80,deg C,1,day,oven,no
13+
S-12,01.02.2024,FeCl3·6H2O,27,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,100,deg C,1,day,oven,no
14+
S-13,01.02.2024,FeCl3·6H2O,27,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,120,deg C,1,day,oven,no
15+
S-14,01.02.2024,FeCl3·6H2O,27,mg,"Benzene-1,4-dicarboxylic acid",16,mg,DMF,140,deg C,2,hours,microwave,no
16+
S-15,01.02.2024,FeCl3·6H2O,27,mg,"Benzene-1,4-dicarboxylic acid",16,mg,"DMF, NaOH",120,deg C,1,day,oven,no
17+
S-16,01.02.2024,FeCl3·6H2O,27,mg,"Benzene-1,4-dicarboxylic acid",16,mg,"DMF, HCI",90,deg C,1,day,oven,no
18+
S-17,01.02.2024,FeCl3·6H2O,27,mg,"Benzene-1,4-dicarboxylic acid",16,mg,"DMF, HCI",120,deg C,1,day,oven,no

examples/biochemistry/ecmofsynthesis.json

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@
9898
"temperature": 120,
9999
"temperature_unit": "deg C",
100100
"time": 1,
101-
"time_unit": "hours",
101+
"time_unit": "hour",
102102
"place": "microwave",
103103
"phase_purity": true
104104
},
@@ -115,7 +115,7 @@
115115
"temperature": 140,
116116
"temperature_unit": "deg C",
117117
"time": 2,
118-
"time_unit": "hours",
118+
"time_unit": "hour",
119119
"place": "microwave",
120120
"phase_purity": true
121121
},
@@ -156,7 +156,7 @@
156156
{
157157
"vial_no": "S-10",
158158
"date": "01.02.2024",
159-
"metal_salt_name": "FeCl36H2O",
159+
"metal_salt_name": "FeCl3.6H2O",
160160
"metal_salt_mass": 27,
161161
"metal_salt_mass_unit": "mg",
162162
"linker_name": "Benzene-1,4-dicarboxylic acid",
@@ -173,7 +173,7 @@
173173
{
174174
"vial_no": "S-11",
175175
"date": "01.02.2024",
176-
"metal_salt_name": "FeCl36H2O",
176+
"metal_salt_name": "FeCl3.6H2O",
177177
"metal_salt_mass": 27,
178178
"metal_salt_mass_unit": "mg",
179179
"linker_name": "Benzene-1,4-dicarboxylic acid",
@@ -190,7 +190,7 @@
190190
{
191191
"vial_no": "S-12",
192192
"date": "01.02.2024",
193-
"metal_salt_name": "FeCl36H2O",
193+
"metal_salt_name": "FeCl3.6H2O",
194194
"metal_salt_mass": 27,
195195
"metal_salt_mass_unit": "mg",
196196
"linker_name": "Benzene-1,4-dicarboxylic acid",
@@ -207,7 +207,7 @@
207207
{
208208
"vial_no": "S-13",
209209
"date": "01.02.2024",
210-
"metal_salt_name": "FeCl36H2O",
210+
"metal_salt_name": "FeCl3.6H2O",
211211
"metal_salt_mass": 27,
212212
"metal_salt_mass_unit": "mg",
213213
"linker_name": "Benzene-1,4-dicarboxylic acid",
@@ -224,7 +224,7 @@
224224
{
225225
"vial_no": "S-14",
226226
"date": "01.02.2024",
227-
"metal_salt_name": "FeCl36H2O",
227+
"metal_salt_name": "FeCl3.6H2O",
228228
"metal_salt_mass": 27,
229229
"metal_salt_mass_unit": "mg",
230230
"linker_name": "Benzene-1,4-dicarboxylic acid",
@@ -234,14 +234,14 @@
234234
"temperature": 140,
235235
"temperature_unit": "deg C",
236236
"time": 2,
237-
"time_unit": "hours",
237+
"time_unit": "hour",
238238
"place": "microwave",
239239
"phase_purity": false
240240
},
241241
{
242242
"vial_no": "S-15",
243243
"date": "01.02.2024",
244-
"metal_salt_name": "FeCl36H2O",
244+
"metal_salt_name": "FeCl3.6H2O",
245245
"metal_salt_mass": 27,
246246
"metal_salt_mass_unit": "mg",
247247
"linker_name": "Benzene-1,4-dicarboxylic acid",
@@ -258,7 +258,7 @@
258258
{
259259
"vial_no": "S-16",
260260
"date": "01.02.2024",
261-
"metal_salt_name": "FeCl36H2O",
261+
"metal_salt_name": "FeCl3.6H2O",
262262
"metal_salt_mass": 27,
263263
"metal_salt_mass_unit": "mg",
264264
"linker_name": "Benzene-1,4-dicarboxylic acid",
@@ -275,7 +275,7 @@
275275
{
276276
"vial_no": "S-17",
277277
"date": "01.02.2024",
278-
"metal_salt_name": "FeCl36H2O",
278+
"metal_salt_name": "FeCl3.6H2O",
279279
"metal_salt_mass": 27,
280280
"metal_salt_mass_unit": "mg",
281281
"linker_name": "Benzene-1,4-dicarboxylic acid",

0 commit comments

Comments
 (0)