harvest Photo Paul Hanaoka on Unsplash

Fetching the desired data from a very large tree can be done in many ways. Here are the solutions we found at COOP, illustrated on several usecases from our in-house Opentea and PyAVBP situations.

Reading time: 3.77 minutes sharp

The example data tree

Here follows, expressed in YAML, a typical data-tree we work on at COOP. This is a setup for a High Fidelity simulation software.

boundaries:
  boundary_stator:
    mul_bnd_stator:
    - bnd_gas:
        inlet_relax_massflow:
          mass_flow: 0.1
          relax_coeff:
            relax_coeff_inlet_relax: 200.0
      name: _Inlet
    - bnd_gas:
        inlet_relax_massflow:
          mass_flow: 0.33
          relax_coeff:
            relax_coeff_inlet_relax: 200.0
      name: _Inlet2
    - bnd_gas:
        outlet_relax_pressure:
          pressure: 101325.0
          relax_coeff_p: 50.0
      name: _Outlet
    - bnd_gas:
        wall_law_adiab: null
      name: _WallIn1
    - bnd_gas:
        wall_law_adiab: null
      name: _WallIn2
    - bnd_gas:
        wall_law_adiab: null
      name: _WallIn3
  mass_balance:
    balance_comment: 'Multiperf  massflow [kg/s] : 0.0000

      Total inlet mass flow [kg/s] : 0.1000

      '
    total_airflow_rate: 0.1
  reserved_patches:
    list_inj_patches:
    - no_injection_patch
combustion:
  mixture:
    kero_LUCHE-AIR-2S-BFER_FLAMMABLE: null
  reactive_type:
    reactive:
      efficiency:
        efficiency_coeff:
          efficiency_constant: 0.5
        efficiency_model: charlette
      phi_type:
        local_phi:
          table_path: default pyavbp table
      thickening_type:
        constant:
          thick_factor: 7.0

This sample has been reduced to fit in this post, the original files can be more than 2000 YAML lines long, with more than 10 nesting levels.

Most of the information is stored here as values, which are the leafs of the tree, such as thick_factor: 7.0. However, you can also see information stored via nodes names : for example, each boundary features a placeholder branch bnd_gas node whose unique child is a sub tree characteristic of the treatment to apply:

  • one inlet_relax_massflow with related parameters,
  • one outlet_relax_pressure with related parameters,
  • and several wall_law_adiab without parameter needed.

No, let’s see the use cases.

Simple usecases in fetching data.

Using python natural approach

THe most straightforward way to fetch the value of thick_factor use the nested dicts:

import yaml
with open('input.yml', "r") as fin:
    data = yaml.load(fin, Loader=yaml.SafeLoader)

target = data["combustion"]["reactive_type"]["reactive"]["thickening_type"]["constant"]["thick_factor"]
print(target)

If a list is somewhere in the address, the process is similar. For example the pressure in the outlet is accessed this way:

target2 = data["boundaries"]["boundary_stator"]["mul_bnd_stator"][2]["bnd_gas"]["outlet_relax_pressure"]["pressure"]
Using Nob library

The nob library makes these two lines easier, because it can search the path for you:

import yaml
import nob
with open('input.yml', "r") as fin:
    data = yaml.load(fin, Loader=yaml.SafeLoader)
tree_nob = nob.Nob(data)

target = tree_nob.thick_factor[:]
target2 = tree_nob.pressure[:]

When the path is less obvious with several potential match, you can break the access into two steps :

bnd2_nob = tree_nob.mul_bnd_stator[1]
target = bnd2_nob.mass_flow

So you can work with nobs.

nob                     #the full tree
nob.attr1               #an attribute
for item in nob.node:   #loop
    (...)

Here all returned elements you are working on are Nob tree. However at all moments, you can switch back to your python comfort zone, simply add the [:].

nob[:]                     #the full tree, back in pure python
nob.attr1[:]               #an attribute, back in pure python
for item in nob.node[:]:   #loop, back in pure python
    (...)

Fetching a node name

Using python natural approach

When you want a node name, the classical approach is via the keys of the dictionary :

bnd2 = data["boundaries"]["boundary_stator"]["mul_bnd_stator"][1]
target = list(bnd2["bnd_gas"].keys())[0]

There is a shorter version:

target, = bnd2["bnd_gas"].keys()

Indeed, the value of .keys() is a object dict_keys(['inlet_relax_massflow']). By assigning the keys to a tuple of length 1, we also check that there is only one single key to "bnd_gas" node. If you were to pass two keys, the error would read ValueError: too many values to unpack (expected 1).

We will however prefer the shortest version:

target, = bnd2["bnd_gas"]

This is the same line as before because in python, iterating over a dictionary is iterating over its keys.

Using Nob library

Obviously You can get the same result via a nob tree:

bnd2_nob = tree_nob.mul_bnd_stator[1]
target, = bnd2_nob.bnd_gas[:]

Indeed, the value of bnd2.bnd_gas[:] is the full dictionary:

{'inlet_relax_massflow': {'mass_flow': 0.33, 'relax_coeff': {'relax_coeff_inlet_relax': 200.0}}}

In this case, the tuple of length 1 get the key.

The top Nob stunts

Looping over the tree

For a simple loop, we can iterate on a python:

bnd2_list = tree_nob.mul_bnd_stator[:]
for item in bnd2_list:
    print("str_ ",item)
    print("repr ",repr(item))
    # dict access
    print("value", item["bnd_gas"])

Here, note that your data at the address mul_bnd_statoris stored as a list. And as in python, iterating over a list is iterating over its items., each item here is a full dictionary.

The alternative is to iterate over Nob representations of the items.

bnd2_nob = tree_nob.mul_bnd_stator
for item_nob in bnd2_nob:
    print("str_ ",item_nob)
    print("repr ",repr(item_nob))
    # nob access
    print("value", item_nob.bnd_gas)

This is because in nob, iterating over a node is iterating over its children, returned as Nob objects.

Note : In python, iterating over a dict returns keys, while iterating over a list returns the items. Using nob, the iterations are done the same way, it does not require a special case.

Looping on a query about nodes

Now that we can loop, we might want to limit our action to some node names of the loop

bnd2_nob = tree_nob.mul_bnd_stator
for item_nob in bnd2_nob:
    bnd_type, = item_nob.bnd_gas[:]
    if bnd_type == "inlet_relax_massflow":
        print(item_nob.relax_coeff_inlet_relax)

But the .find()method is more suited to this purpose:

inlet_paths = tree_nob.find("inlet_relax_massflow")
for path in inlet_paths:
    print(tree_nob[path].relax_coeff_inlet_relax)
Queries about values

Sometimes we want to identify a subtree according to one of its values. The long way we often end up is :

bnd2_nob = tree_nob.mul_bnd_stator

for item_nob in bnd2_nob:
    bnd_name = item_nob.name[:]
    if bnd_name == "_Outlet":
        outlet_tree = item_nob
        break

print(outlet_tree)

However you should reduce this to one line:

bnd2_nob = tree_nob.mul_bnd_stator

outlet_tree, = [item_nob for item_nob in bnd2_nob if item_nob.name[:] == "_Outlet"]
print(outlet_tree)

Takeaway, for COOP people

If you are dealing with Nob, read and re-read the Nob documentation. If a nob query feels weird or complex, discuss it ASAP with your fellows. It should stay on the simple side of life. Prefer working on Nob trees in your data fetching. This abstraction will help us to monitor and improve our dataflow. Finally, for the sake of performance, stick to the Nob cookbook.

fetching

(Photography Bon Vivant on Unsplash) Good luck fetching your data…

Like this post? Share on: TwitterFacebookEmail


Antoine Dauptain is a research scientist focused on computer science and engineering topics for HPC.

Keep Reading


Published

Category

Pitch

Tags

Stay in Touch