This short post will present a way to deal with a common problem in nested objects: the presence of indexes in paths. Take for example
example_dict["access"]["to"]["certain"]["level"][1]["access"]["other"]["level"]
How do we navigate through such a nested object without prior knowledge on the specific index (1)??? The proposed solution relies on the Nob Python package. A dedicated blog post discusses the common usage of Nob adopted at COOP. In the present post we will dive into a slightly different usage.
reading time: 7 min
The context
We’ll work with a generic example to illustrate the problem we wish to tackle.
We have a list of properties from several people in different locations. We’ll cast this list in a YAML format and name it input.yml
. Its content is given below.
# content of input.yml
europe:
france:
toulouse:
rue_alsace:
- person: sarah_connor
type: house
- person: jack_burton
type: house
rue_lautrec:
- person: chuck_norris
type: house
- person: sarah_connor
type: house
- person: john_mclane
type: flat
apt: 31
paris:
rue_alsace:
- person: sarah_connor
type: flat
apt: 44
- person: john_doe
type: flat
apt: 43
rue_rivoli:
- person: john_carter
type: house
- person: bruce_wayne
type: mansion
spain:
madrid:
castellana:
- person: bruce_wayne
type: mansion
valencia:
sanpedro:
- person: sancho_panca
type: hut
mars:
olympus_mons:
barsoom:
- person: john_carter
type: palace
The desired outcome
Given the list (input.yml), we wish to modify some of the entries within a python script. This task could be considered as ordinary and chances are high you will encounter it in one form or another within the world of scientific computing (editing inputs, outputs, etc.) .
Suggested scenario
Bruce Wayne got in a financial pinch due to the covid-19 crisis. He decides
- to downgrade his property in Madrid by selling his mansion and acquire a flat instead in the same district
- to sell his mansion in Paris
Let’s modify the above list accordingly.
The issue
We can read the input.yml file in python yielding a dictionary (dof)
import yaml
with open("input.yml", 'r') as fin:
mydict = yaml.load(fin, Loader=yaml.FullLoader)
which will result in a nested dictionary.
Now to update the list according to the suggested scenario, the method would be,
- for the downgrade
mydict["europe"]["spain"]["madrid"]["castellana"][0]["type"] = "flat"
- for the sale (we will add a status key)
mydict["europe"]["france"]["paris"]["rue_rivoli"][1]["status"] = "for sale"
Both actions require the knowledge of the index ([0], [1]) corresponding to the properties listed for Bruce Wayne. The issue arises because of the occurences of dictionaries inside of lists which are inside a dictionary: dict[list[dict]].
In se we could achieve what we want with the above but
- the dependence on pre-existing knowledge on the exact index corresponding to the location within the nested dictionary of the properties listed under Bruce Wayne in a given place is not desirable and could easily lead to problems. A “smarter” way would consist of obtaining the index of the item we wish to modify within the nested dictionary, thus removing the need for a priori knowledge. This could, presently, be done based on the owner’s name.
- it is extremely cumbersome to have to specify all the levels in the nested dictionary before reaching the end point.
So let’s explore another way to do so.
The alternative
Instead of working with a standard dictionary we will rely on the Nob package which offers an elegant way to manipulate nested objects.
Let’s convert our dictonary into a Nob object.
from nob import Nob
nob_tree = Nob(mydict)
In case of Bruce Wayne’s property downgrade, to access the same key we wish to manipulate we can do several things (see also PyPI description on Nob).
# full path
nob_tree["/europe/spain/madrid/castellana/0/type"][:]
# shorter path 1
nob_tree.castellana["/0/type"][:]
# shorter path 2
nob_tree.castellana[0].type[:]
# shortest path
nob_tree.castellana.type[:]
Note: the usage of [:] allows to output the value associated with the nested key.
The full_path is similar to what we would have to do with a dictionary, except for the advantage of specyfing an absolute path at once and not through multiple [], but it remains cumbersome. The advantage of Nob starts to become visible in the other options. First we can reduce the path specification prior to the index we wish to obtain as can be seen in the shorter path alternatives: nob_tree.castellana
. The operation relies on the uniqueness of the key (here: castellana) in the nested object. Then we can access the type of property associated with Bruce Wayne in a similar manner with shorter path 2 again relies on the uniqueness of the “type” key. The issue remains that we require knowledge on the index [0] to continue our search. In this specific case, an even quicker method is possible as shown by shortest path which relies on the fact that there is only one owner and property listed under castellana.
In the second case we could use
nob_tree.rue_rivoli[1]["status"] = "for sale"
which still requires an index.
Let’s describe a generic approach to avoid the need for knowledge on the indexes.
In the present case we will assume the uniqueness of the “castellana” and “rue_rivoli” key which is known to the user. There are two solution paths that can be explored.
Solution with list comprehension
Making use of list comprehension, Bruce Wayne’s property type can readily be obtained. We first define an intermediate subtree, followed by a search operation within that subtree.
sub_tree = nob_tree.castellana
out_ = [
path[:-1]
for path in sub_tree.find("person")
if sub_tree[path][:] == "bruce_wayne"
]
bruce_path, = out_ # _, unpack to check only one value was found
The method relies on the find()
functionality within the nested object. Note that the method to obtain the index could be written in a single line as follow at the cost of a reduced readability.
bruce_path, = [path[:-1] for path in sub_tree.find("person") if sub_tree[path][:] == "bruce_wayne"]
Then we can perform our modification as follow
# check current property type
In [1]: sub_tree[bruce_path].type[:]
Out[1]: 'mansion'
# set our desired value
sub_tree[bruce_path].type = "flat"
# check if everything worked fine
In [3]: sub_tree[bruce_path].type[:]
Out[3]: 'flat'
For the sale of Bruce Wayne’s mansion in Paris we perform a similar approach but working with a different subtree.
# define new subtree
sub_tree = nob_tree.rue_rivoli
# use same procedure as previously
bruce_path, = [path[:-1] for path in sub_tree.find("person") if sub_tree[path][:] == "bruce_wayne"]
# add key
sub_tree[bruce_path]["status"] = "for sale"
If multiple keys are to be manipulated it could become handy to consider a function which returns the path to the person of interest or even the property type associated to the person of interest itself while still relying on list comprehension.
Final comments
The suggested solution method is not conditional upon the presence of lists within nested objects. In fact, a filtering task is performed on the latter type of data structures and could as well be performed on any other nested object, and hence is not limited by the occurence of indexes. Take for example the following database:
europe:
france:
person: bruce_wayne
type: mansion
italy:
person: bruce_wayne
type: flat
If we wish to get the types of properties associated with Bruce Wayne in the above example we would have to perform a similar search operation as detailed in this post’s context but no indexing would be encountered.
Take away
Look into nob and get a grasp of its use. It will save you the effort in trying to create functionalities to search through nested objects and manipulate them at will.