The Data in Data Class
Python has well-known data holders. If you need a sequence, you can use a list. If you need an immutable sequence, use a tuple. If you need a homogeneous array, use the array module. If you need a hash table, a dictionary is likely what you want.
However, sometimes we need more.
More control, more flexibility, more objects that perfectly fit into the domains of problems we are to solve.
Named Tuples for the Win (or Almost)
Named tuples offer a friendly and clean way to define data holders that behave like objects from a class we defined.
I'm going to use a Payment
instance to illustrate the next examples.
from collections import namedtuple
Payment = namedtuple("Payment", ["id_", "amount", "method"])
payment = Payment(1, amount=123, method="CC")
print(payment)
>>> Payment(id_=1, amount=123, method='CC')
If you like typing, you can also use the typed version of the named tuple:
from typing import NamedTuple
class Payment(NamedTuple):
id_: int
amount: int
method: str
payment = Payment(id_=2, amount=1234, method="ACH")
>>> print(payment)
Payment(id_=2, value=123, method='CC')
>>> payment.id_, payment.amount, payment.method
(2, 1234, 'ACH')
That seems to solve the data holder quest quickly, right? Simple API, available in the standard library, no class boilerplate.
However, it is essential to remember that Payment, our NamedTuple, is still a tuple behind the scenes.
And why is that important? Because every instance of Payment, for good or bad, will behave just like a tuple.
Named Tuples Are Still Tuples
You can call len()
on your named tuple instance:
>>> len(payment)
3
You can unpack it:
>>> id_, amount, method = payment
>>> id_, amount, method
(2, 1234, 'ACH')
Named tuples are immutable:
>>> payment.amount = 345
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
This one can be very counterintuitive: -If you compare the instance created by the Payment
named tuple with a tuple having the same values, the comparison will be true:
>>> payment = Payment(id_=2, amount=1234, method="ACH")
>>> payment == (2, 1234, "ACH")
True
Data Classes as Data Holders
If named tuples are almost what you want, but you are still missing some flexibility, data classes might be a good fit.
We can smoothly go from a typed named tuple to a data class. Remove the namedtuple
inheritance and decorate the class with @dataclass
:
from dataclasses import dataclass
@dataclass
class Payment:
id_: int
amount: int
method: str
The Payment instance will work almost like the namedtuple for what a data holder is concerned:
>>> payment
Payment(id_=2, amount=1234, method='ACH')
>>> payment.id_, payment.amount, payment.method
(2, 1234, 'ACH')
However, the instance is now mutable, has no len()
, and can no longer compare with a tuple:
>>> assert payment == (2, 1234, "ACH")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
Field by Field Implementation
If we aim for more control and flexibility, the default fields in a data class may not be enough. That is why each field can be implemented in a more granular way using field()
In the example below, I'm defining quite a few setup options in a few lines:
id_
is an auto-generated UUID field that won't be used for comparison with another Payment instance and won't display its value in the object'srepr
.amount
is aDecimal
field initializing as Zero Decimal if no value is passed.method
defaults toCC
on every object.
from dataclasses import dataclass, field
from decimal import Decimal
from uuid import uuid4
@dataclass(frozen=True)
class Payment:
id_: int = field(repr=False, compare=False, default_factory=uuid4)
amount: Decimal = field(default_factory=Decimal)
method: str = field(default='CC')
payment = Payment()
>>> payment
Payment(amount=Decimal('0'), method='CC')
>>> payment.id_
UUID('05d37e88-d069-484a-b454-42c2ca3594fc')
Why Not a Regular Class?
Still, this looks like something we could do with a regular class, right? Yes, we could. And data classes are indeed regular classes.
What makes them shine, besides its configurability, is its code-saving approach.
The Class in Data Classes
Data classes save considerable amounts of boilerplate code. They ship with pre-defined dunder methods that we would likely have to write ourselves.
Data classes have built-in string representation for friendly printing, so you don't need to write __str__()
or __repr__()
methods:
payment = Payment(id_=2, amount=1234, method="ACH")
# Non-dataclass default printing:
<__main__.Payment object at 0x10ae98d00>
# Dataclasses buil-int string representation
Payment(id_=2, amount=1234, method='ACH')
Default Values
It is possible to initiate a data class with default values:
@dataclass
class Payment:
id_: int
amount: int
method: str = 'CC'
Validation
The __post_init__()
method allows injecting initialization logic in data classes. Useful for things like validation.
Comparable by Default
Data classes implement __eq__()
by default, so the comparison considers each value in the object like a tuple comparison:
payment_1 = Payment(id_=1, amount=1234, method="ACH")
payment_2 = Payment(id_=1, amount=1234, method="ACH")
>>> payment_1 == payment_2
>>> True
Making it sortable
Passing order=True
to the dataclass
decorator implementes __lt__()
, __le__()
, __gt__()
, and __ge__()
, making the instances fully sortable.
Undoubtedly one of my favorite capabilities in data classes. So much code saved!
@dataclass(order=True)
class Payment:
id_: int
amount: int
method: str
>>> payment_1 = Payment(id_=1, amount=1234, method="ACH")
>>> payment_2 = Payment(id_=1, amount=3456, method="ACH")
>>> payment_2 > payment_1
True
>>> payment_1 > payment_2
False
Making it Immutable (or almost)
Python doesn't allow for genuinely immutable objects. However, you can create a frozen data class that emulates immutability pretty well.
Passing frozen=True
to the dataclass decorator implementes __setattr__()
and __delattr__(
)
in a way that raises a FrozenInstanceError
when someone tries updating or deleting a property:
@dataclass(frozen=True)
class Payment:
id_: int
amount: int
method: str
payment = Payment(id_=1, amount=1234, method="ACH")
>>> payment.amount = 2345
"""
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'amount'
"""
>>> del payment.method
"""
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in __delattr__
dataclasses.FrozenInstanceError: cannot delete field 'method'
"""
Can It Be Hashable?
The short answer is most likely. However, this topic can get quite convoluted, so I'll defer it to a separate article.
Did you find this article helpful? What are some of your use cases for data classes?