Nested Python Dataclasses With List Annotations
Solution 1:
This doesn't really answer your question about the nested decorators, but my initial suggestion would be to avoid a lot of hard work for yourself by making use of libraries that have tackled this same problem before.
There are lot of well known ones like pydantic which also provides data validation and is something I might recommend. If you are interested in keeping your existing dataclass
structure and not wanting to inherit from anything, you can use libraries such as dataclass-wizard and dataclasses-json. The latter one offers a decorator approach which you might interest you. But ideally, the goal is to find a (efficient) JSON serialization library which already offers exactly what you need.
Here is an example using the dataclass-wizard
library with minimal changes needed (no need to inherit from a mixin class). Note that I had to modify your input JSON object slightly, as it didn't exactly match the dataclass schema otherwise. But otherwise, it looks like it should work as expected. I've also removed copy.deepcopy
, as that's a bit slower and we don't need it (the helper functions won't directly modify the dict
objects anyway, which is simple enough to test)
from dataclasses import dataclassfrom typing importListfrom dataclass_wizard import fromlist
@dataclassclassIterationData:
question1: str
question2: str@dataclassclassIterationResult:
name: str
data: IterationData
@dataclassclassIterationResults:
iterations: List[IterationResult]
@dataclassclassInstanceData:
date: str
owner: str@dataclassclassInstance:
data: InstanceData
name: str@dataclassclassResult:
status: str
iteration_results: IterationResults
@dataclassclassMergedInstance:
instance: Instance
result: Result
single_instance = {
"instance": {
"name": "example1",
"data": {
"date": "2021-01-01",
"owner": "Maciek"
}
},
"result": {
"status": "complete",
"iteration_results": {
# Notice i've changed this here - previously syntax was invalid (this was# a list)"iterations": [
{
"name": "first",
"data": {
"question1": "yes",
"question2": "no"
}
}
]
}
}
}
instances = [single_instance for i inrange(3)] # created a list just to resemble mydata
objres = fromlist(MergedInstance, instances)
for obj in objres:
print(obj)
Using the dataclasses-json
library:
from dataclasses import dataclassfrom typing importListfrom dataclasses_json import dataclass_json
# Same as above
...
@dataclass_json@dataclassclassMergedInstance:
instance: Instance
result: Result
single_instance = {...}
instances = [single_instance for i inrange(3)] # created a list just to resemble mydata
objres = [MergedInstance.from_dict(inst) for inst in instances]
for obj in objres:
print(obj)
Bonus: Let's say you are calling an API that returns you a complex JSON response, such as the one above. If you want to convert this JSON response to a dataclass schema, normally you'll have to write it out by hand, which can be a bit tiresome if the structure of the JSON is especially complex.
Wouldn't it be cool if there was a way to simplify the generation of a nested dataclass structure? The dataclass-wizard
library comes with a CLI tool that accepts an arbitrary JSON input, so it should certainly be doable to auto-generate a dataclass schema given such an input.
Assume you have these contents in a testing.json
file:
{"instance":{"name":"example1","data":{"date":"2021-01-01","owner":"Maciek"}},"result":{"status":"complete","iteration_results":{"iterations":[{"name":"first","data":{"question1":"yes","question2":"no"}}]}}}
Then we run the following command:
wiz gs testing testing
And the contents of our new testing.py
file:
from dataclasses import dataclassfrom datetime import date
from typing importList, Unionfrom dataclass_wizard import JSONWizard
@dataclassclassData(JSONWizard):
"""
Data dataclass
"""
instance: 'Instance'
result: 'Result'@dataclassclassInstance:
"""
Instance dataclass
"""
name: str
data: 'Data'@dataclassclassData:
"""
Data dataclass
"""
date: date
owner: str@dataclassclassResult:
"""
Result dataclass
"""
status: str
iteration_results: 'IterationResults'@dataclassclassIterationResults:
"""
IterationResults dataclass
"""
iterations: List['Iteration']
@dataclassclassIteration:
"""
Iteration dataclass
"""
name: str
data: 'Data'@dataclassclassData:
"""
Data dataclass
"""
question1: Union[bool, str]
question2: Union[bool, str]
That appears to more or less match the same nested dataclass structure from the original question, and best of all we didn't need to write any of the code ourselves!
However, there's a minor problem - because of some duplicate JSON keys, we end up with three data classes named Data
. So I've went ahead and renamed them to Data1
, Data2
, and Data3
for uniqueness. And then we can do a quick test to confirm that we're able to load the same JSON data into our new dataclass schema:
import json
from dataclasses import dataclassfrom datetime import date
from typing importList, Unionfrom dataclass_wizard import JSONWizard
@dataclassclassData1(JSONWizard):
"""
Data dataclass
"""
instance: 'Instance'
result: 'Result'@dataclassclassInstance:
"""
Instance dataclass
"""
name: str
data: 'Data2'@dataclassclassData2:
"""
Data dataclass
"""
date: date
owner: str@dataclassclassResult:
"""
Result dataclass
"""
status: str
iteration_results: 'IterationResults'@dataclassclassIterationResults:
"""
IterationResults dataclass
"""
iterations: List['Iteration']
@dataclassclassIteration:
"""
Iteration dataclass
"""
name: str
data: 'Data3'@dataclassclassData3:
"""
Data dataclass
"""
question1: Union[bool, str]
question2: Union[bool, str]
# ---- Start of our testwithopen('testing.json') as in_file:
d = json.load(in_file)
c = Data1.from_dict(d)
print(repr(c))
# Data1(instance=Instance(name='example1', data=Data2(date=datetime.date(2021, 1, 1), owner='Maciek')), result=Result(status='complete', iteration_results=IterationResults(iterations=[Iteration(name='first', data=Data3(question1='yes', question2='no'))])))
Solution 2:
Use dacitefrom_dict
. This is what you need in order to handle nested dataclass.
from dataclasses import dataclassfrom dacite import from_dict
@dataclassclassUser:
name: str
age: int
is_active: bool
data = {
'name': 'John',
'age': 30,
'is_active': True,
}
user = from_dict(data_class=User, data=data)
Solution 3:
you can actually nest dataclasses directly in a definition, and it works fairly well. Take a look at a post of mine where I was trying to solve a similiar problem a while back: Python nested dataclasses ...is this valid?
Or you can define a 'child' dataclass, and have that as the type of an element in a 'parent' container dataclass.
I still use this approach in production code today, and it works well (I also use dataclasses-json, as someone mentioned, for json serialisation and also do validation of consistency.
I also twisted nested dataclasses to allow exporting of json-schemas, based on their definitions. ...not simple but do-able. (For our use case - exporting data for import by NodeJS application, a json schema was necessary).
However as the first reply mentioned, there is a better approach (likely in your case) - which is to use pydantic. I'd recommend if you're starting almost from scratch to go with that.
It's on my list of to-do's for our production code to re-factor it to use pydantic instead of nested dataclasses: nested dataclasses do work, and you can get them to do json serialisation and self-validation against they're defined typing. ...but it's a bit of a pain IMHO.
This is what pydantic was designed to do, and (again IMHO) does it a lot simpler and cleaner out of the box.
Post a Comment for "Nested Python Dataclasses With List Annotations"