Learn How Dataclasses Work in Python

It is common for us to work with simple data structures such as a Tuple (tuple) or a Dictionary (dict) in Python. We use them almost everywhere and every day in our lives programmers to store data.

For instance, we can represent a car object with the code example below

# Using Dictionary
car = {"name": "Model X", "brand": "Tesla", "price": 120_000}

# OR using Tuple
car = ("Model X", "Tesla", 120_000)

Yet, these basic data structures become less ideal when we have to deal with more complex data. Here, we would need to remember that car represents a car Dictionary or Tuple in our app, not some string or integer.

Using Tuple to represent our car object in the example above works just fine if we only have three fields (name, brand, and price). As we add more fields into our car object such as manufacturer, condition, etc., we would need to remember our attributes' order.

For the case of using Dictionary, we would not able to use dot notation (i.e. car.name) to access our attributes. Plus, a deep-nested Dictionary tends to be very messy to work with.

Here, we are going to talk about better alternatives to our regular Dictionary or Tuple.

Topics covered

  • Named Tuple
  • Data Classes, a better alternative to Named Tuple
  • Customizing Data Classes
  • When to use Data Classes

Let’s start!

Named Tuple To The Rescue

A more common approach is to use Named Tuple (namedtuple) from Python's built-in collections library.

Using our car example above here is what Named Tuple would look like:

from collections import namedtuple

Car = namedtuple('Car', ['name', 'brand', 'price'])
car = Car('Model X', 'Tesla', 120_000)

Much better. So, why not just use Named Tuple all the time?

Well, Named Tuple does come with its own sets of restrictions. Besides not being able to assign a default value to our car properties, Named Tuple is immutable by nature.

Here’s an explanation from PEP on why we shouldn’t just use Named Tuple.

Moreover, using Dictionary, Tuple, or even Named Tuple does not allow us to have custom class methods, which begs the question: why not just use the regular Python Class

Python Class

In Python, everything is an object, and most objects have attributes and methods. Typically, we would use class in Python to create our own custom objects with their own properties and methods.
Using our previous example to create a simple car object:

class Car:
    def __init__(self, name: str, brand: str, price: int) -> None:
        self.name = name
        self.brand = brand
        self.price = price


car1 = Car('Model X', 'Tesla', 120_000)
car2 = Car('Model X', 'Tesla', 120_000)

car1 == car2 # False. We need to write our own __eq__ method to handle this.

Every time a new property is added to our car object, we would need to pass them into the __init__ method. What if we needed to add a more descriptive representation of our car object to our __repr__ method? What if we need to compare two car instances of the same car object?

Honestly, things aren’t that bad when we’re only dealing with a single car object. But what if we have to add more classes such as Manufacturer, CarDealer, etc.?

As you can already tell, the signs of code duplication are everywhere, and it smells! Truth to be told, unless we actually need custom methods, we might be better off using Named Tuple.

As the bearer of bad news, this is often not the case in real life.

Enter Data Classes

Introduced in Python 3.7, Data Classes (dataclasses) providers us with an easy way to make our class objects less verbose. To simplify, Data Classes are just regular classes that help us abstract a tonne of boilerplate codes.

To rewrite our previous example with Data Class, we simply have to decorate our basic class with @dataclass:

from dataclasses import dataclass

@dataclass
class Car:
    name: str # Supports typing out of the box!
    brand: str
    price: int

car1 = Car('Model X', 'Tesla', 120_000)
car2 = Car('Model X', 'Tesla', 120_000)

car1 == car2 # True. __eq__ is generated automatically.

car2.name # Supports dot annotation!

The best part of Data Class is that it automatically generates common Dunder methods in the class such as the __repr__ and __eq__, eliminating all the duplicated code.

Customizing Data Class

  1. In certain cases, we might need to customize our Data Class fields:

    from dataclasses import dataclass, field
    
    @dataclass
    class Car:
       name: str = field(compare=False)  # To exclude this field from comparison
       brand: str = field(repr=False)  # To hide fields in __repr__
       price: int = 120_000
       condition: str = field(default='New')
    
  2. To override what happens after __init__ inside our newly created Data Class, we can declare a __post_init__ method. For example, we can easily override the price of the car based on its initialized condition:

    from dataclasses import dataclass, field
    
    @dataclass
    class Car:
       name: str = field(compare=False)
       brand: str = field(repr=False)
       price: int = 120_000
       condition: str = field(default='New')
    
       def __post_init__(self):
          if self.condition == "Old":
                self.price -= 30_000
    
    old_car = Car('Model X', 'Tesla', 130_000, 'Old')
    # Car(name='Model X', price=100000, condition='Old')
    
  3. To make our Data Class immutable, we simply have to add @dataclass(frozen=True) as our decorator.

  4. Another good use case of Data Class is when we need to deal with nested Dictionary. Here’s a simple example of what a Data Class could do:

    # ...
    from typing import List
    
    @dataclass
    class CarDealer:
       cars: List[Car]
    
    car3 = Car('Model S', 'Tesla', 89_000)
    car4 = Car('Model Y', 'Tesla', 54_000)
    car_dealer = CarDealer(cars=[car3, car4])
    
    # CarDealer(cars=[Car(name='Model S', price=89000, condition='New'), Car(name='Model Y', price=54000, condition='New')])
    
  5. Lastly, in case it wasn’t obvious, Data Class supports inheritance too as they indeed behave just like our good old regular class.

So, when to use Data Class?

vs. Named Tuple

The use of Data Class is most often compared with the use of Named Tuple. For the most part, Data Class offers the same advantage if not more than Named Tuple.

In the case where you need to unpack your variables, you might want to consider using Named Tuple instead.

vs. Dictionary

When our Dictionary has a fixed set of keys where their corresponding values have fixed types, it is almost always better to use Data Class.

In short, the rule of thumb is rather simple, if you create a Dictionary or a Class that mostly consists of attributes about the underlying data, use Data Class. It saves you a bunch of time.

Finally, Data Class also preserves type information for each property, which is a huge added advantage!

Closing Thoughts

Again, there is nothing wrong with just creating regular classes in Python. However, that could mean writing a lot of repetitive and boilerplate code just to set up our class instance.

To summarize what we went through, Data Class is great because it:

  • Saves time and reduce code duplication
  • Offers more flexibility, it can be mutable or immutable
  • Supports inheritance
  • Allows for customization and default values

Don’t get me wrong. Not every class in Python needs to be a Data Class. A Data Class is not a silver bullet.

For the most part, we should always keep in mind that we shouldn’t complicate things if we don’t have to. As long as we’re not dealing with something overly complex, a good old Dictionary might just do the job.

Thank you for reading!

18