Acropolis of Athena : Detail of a restoration. Once the shape is known, filling the holes is possible. The same goes for datasets, if you describe them with json-SCHEMA
Reading time 5 min
Using the json-SCHEMA standard for scientific applications
JSON Schema is a standard used in web applications, defining what data is needed for an application and how it can be modified. This standard is now extremely mature and ubiquitous, with an excellent documentation.
Meanwhile, scientific applications often use large and very complex inputs without any standard. For engineering applications, for example, having four shape-shifting families of inputs is common. The more you go into high fidelity simulation, the more the inputs you get. An aeronautical combustion chamber setup shows typically more than 3000 d.o.f.
We will see how SCHEMA standard can help to validate the input, add a precise documentation, auto-fill the missing part, and even create graphical user interfaces.
Our test-case, the flow past an obstacle.
Vortex shedding behind an obstacle is the origin of many natural manifestations, from converting expiration into voice to destroying bridges by the wind. The flow past a cylinder is an academical Computational Fluid Dynamics (CFD) engineering test case on vortex shedding.
The input could be expressed, using the YAML standard format, like this
mesh: # Geometry
lenght: 1. # x_direction [m]
width: 0.3 # y direction [m]
resolution: 0.01 # delta_x [m]
obstacle:
type: cylinder
size: 0.05 # diameter [m]
fluid:
density: 1.2 # [Kg/m3]
viscosity: 1.8e-5 # [Kg/(m.s)]
init_speed: 3. # [m/s]
numerics:
poisson_tol: 0.05 # [-] tolerance
poisson_maxsteps: 4 # [it.] max. iterations to converge
scheme: "first_order" # either first_order or second order
Generating a SCHEMA
SCHEMA is associated to the JSON serialization standard. First we convert the data into a JSON string, using for example an online conversion from YAML to JSON.
The same information is therefore expressed in a more rigid format using extensively nested braces {}
instead of nested indentations:
{
"mesh": {
"lenght": 1,
"width": 0.3,
"resolution": 0.01
},
"obstacle": {
"type": "cylinder",
"size": 0.05
},
"fluid": {
"density": 1.2,
"viscosity": 0.00001,
"init_speed": 3
},
"numerics": {
"poisson_tol": 0.05,
"poisson_maxsteps": 4,
"scheme": "centered"
}
}
Now we infer the SCHEMA adapted to the data, using one of the multiple online tools. If you have many example of your data, you could use skinfer, which is able the make much more advanced inferences.
In the present case we get:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"mesh": {
"type": "object",
"properties": {
"lenght": {
"type": "integer"
},
"width": {
"type": "number"
},
"resolution": {
"type": "number"
}
},
"required": [
"lenght",
"width",
"resolution"
]
},
"obstacle": {
"type": "object",
"properties": {
"type": {
"type": "string"
},
"size": {
"type": "number"
}
},
"required": [
"type",
"size"
]
},
"fluid": {
"type": "object",
"properties": {
"density": {
"type": "number"
},
"viscosity": {
"type": "number"
},
"init_speed": {
"type": "integer"
}
},
"required": [
"density",
"viscosity",
"init_speed"
]
},
"numerics": {
"type": "object",
"properties": {
"poisson_tol": {
"type": "number"
},
"poisson_maxsteps": {
"type": "integer"
},
"scheme": {
"type": "string"
}
},
"required": [
"poisson_tol",
"poisson_maxsteps",
"scheme"
]
}
},
"required": [
"mesh",
"obstacle",
"fluid",
"numerics"
]
}
You can now tinker with this SCHEMA, using the SCHEMA reference to make the validation extremely precise:
"density": {
"title": "Density",
"description": "The density of the fluid, expressed in Kg/m3.",
"type": "number",
"default": 1.2,
"minimum": 0.001,
"exclusiveMaximum": 100.,
},
You can also limit the options allowed:
"scheme": {
"title": "Numerical scheme."
"description": "The scheme is used to express operators",
"type": "string"
"enum" : ["centered", "upwind"]
"default" : "centered"
}
Data validation
Once we have this SCHEMA, we can validate the original data. In python you can use for example the jsonschema package. To install it:
>pip install jsonschema
Then to use it:
import yaml
import json
from jsonschema import validate
# read the data
with open('input.yml', "r") as fin:
data = yaml.load(fin, Loader=yaml.FullLoader)
# read the schema
with open('schema.json', "r") as fin:
myschema = json.load(fin)
# validation
validate(instance=data, schema=myschema)
What would happen then if, the keyword scheme
was replaced byderivative
in in the input? Of course an exception, but with a quite informative output:
Traceback (most recent call last):
File "valida.py", line 14, in <module>
validate(instance=data, schema=myschema)
File "/Users/dauptain/Python_envs/dev_opentea/lib/python3.8/site-packages/jsonschema-3.2.0-py3.8.egg/jsonschema/validators.py", line 934, in validate
raise error
jsonschema.exceptions.ValidationError: 'scheme' is a required property
Failed validating 'required' in schema['properties']['numerics']:
{'properties': {'poisson_maxsteps': {'type': 'integer'},
'poisson_tol': {'type': 'number'},
'scheme': {'type': 'string'}},
'required': ['poisson_tol', 'poisson_maxsteps', 'scheme'],
'type': 'object'}
On instance['numerics']:
{'derivatives': 'first_order',
'poisson_maxsteps': 4,
'poisson_tol': 0.05}
Customizing the validation message.
This warning is a bit verbose, and can become unreadable if the SCHEMA is too large. However you can hack the validate()
of jsonschema
. In COOP’s package opentea, you can use the validate_light()
function like this:
import yaml
import json
from opentea.noob.validate_light import validate_light
# read the data
with open('input.yml', "r") as fin:
data = yaml.load(fin, Loader=yaml.FullLoader)
# read the schema
with open('schema.json', "r") as fin:
myschema = json.load(fin)
# validation
validate_light(data, myschema)
Indeed, it is the same as `validate()’ with a more human readable output :
Traceback (most recent call last):
File "valida.py", line 15, in <module>
validate_light(data, myschema)
File "/Users/dauptain/GITLAB/opentea/src/opentea/noob/validate_light.py", line 32, in validate_light
raise ValidationErrorShort(err_msg)
opentea.noob.validate_light.ValidationErrorShort:
========================
derivatives: first_order
poisson_maxsteps: 4
poisson_tol: 0.05
does not validate against
properties:
poisson_maxsteps:
type: integer
poisson_tol:
type: number
scheme:
type: string
required:
- poisson_tol
- poisson_maxsteps
- scheme
type: object
We have here an efficient and systematic way to validate extremely large and complex information.
Extension to an HDF5 structure validation
The HDF5 files are also nested objects, and their structure can be scanned as a dictionary. In COOP’s package opentea, the tool get _h5_structure()
provide the structure of the file. The use is the following:
from opentea.tools.visit_h5 import get_h5_structure
dict_ = get_h5_structure("awesome_mesh.h5")
print(yaml.dump(dict_, default_flow_style=False))
The structure look like this:
Connectivity:
dtype: int32
tet->node:
dtype: int32
value: array of 181879640 elements
value: array of 181879640 elements
Coordinates:
dtype: float64
value: array of 8043365 elements
x:
dtype: float64
value: array of 8043365 elements
y:
dtype: float64
value: array of 8043365 elements
z:
dtype: float64
value: array of 8043365 elements
(...)
This dictionary can be used like the input file of the initial example. Therefore, one can validate the compatibility of an HDF5 file with a program using the SCHEMA standard.
Data completion
If the the SCHEMA is known, it is possible to infer the missing part of an input, by filling with default values. Assuming we have only the following input:
obstacle:
type: cylinder
size: 0.09 # diameter [m]
We can infer the full data from the schema. The nob_complete()
function is an implementation of this feature in opentea:
import yaml
import json
#from jsonschema import validate
from opentea.noob.inferdefault import nob_complete
# read the data
with open('input2.yml', "r") as fin:
data = yaml.load(fin, Loader=yaml.FullLoader)
# read the schema
with open('schema.json', "r") as fin:
myschema = json.load(fin)
# Completion
dict_ = nob_complete(myschema, update_data=data)
print(yaml.dump(dict_, default_flow_style=False))
The result is then :
fluid:
density: 1.2
init_speed: 1.4
viscosity: 1.0e-05
mesh:
lenght: 3.0
resolution: 0.01
width: 1.0
numerics:
poisson_maxsteps: 10
poisson_tol: 0.001
scheme: centered
obstacle:
size: 0.09
type: cylinder
Using this property, we can reduce drastically the amount of information for a complex setup, if we assume the user want to use default values elsewhere
Create graphical user interfaces from SCHEMA
The SCHEMA was created to allows two web services to exchange information, and the usual way to get the information from the end user is a form. There is therefore plenty of ways to generate forms from the SCHEMA, in other words to generate a Graphical User Interface.
-
JSON Editor (plain JS) (demo)
-
Angular Schema Form (AngularJS) (website)
-
React JSONSchema Form (React) (demo). Finally, if you want to generate a GUI using Python/TkInter the same way, you can have a look again at the GUI section ofopentea.
Takeaway
SCHEMA is a core component of data exchange in the web today. A ridiculous amount of tools and people are available on this technology, by far more numerous than the small HPC field.
If you are in the HPC business, make up your mind on what SCHEMA can do for your setups using this interactive page. It could replace for free many in-house parsers and global validation processes.
This work has been supported by the EXCELLERAT project which has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 823691.