In programming, serialization is the process of turning an object into a new format that can be stored (e.g. files or databases) or transmitted (e.g. over the internet). Deserialization, therefore, is the process of turning something in that format into an object. Serialization is often called "marshalling", and deserialization, "unmarshalling".

That's where the name "marshmallow" comes from.

Marshmallow is a Python library developed to simplify the process of serialization and deserialization. It can take our Python objects and turn them into native Python data types such as dictionaries or strings, and also the other way round.

Serialization with marshmallow and Python

First, we must install marshmallow:

pip install marshmallow

Make sure to install marshmallow 3, as that is the new version. The code we write in this blog post will be for marshmallow 3.

Once it's installed, you can go ahead and create a Schema.

A Schema definition tells marshmallow what individual pieces of data it will deal with when serializing or deserializing. Just to repeat once more:

  • Marshmallow will serialize our objects into native data types containing these individual pieces of data. For example, it can turn an object into a dictionary.
  • Marshmallow will deserialize native data types containing these individual pieces of data into our objects.

So let's say we have this class:

class Store:
    def __init__(self, name: str, location: str):
        self.name = name
        self.location = location

This is a very simple class whose constructor has two parameters: two strings representing the store's name and its location.

As humans, we could easily identify that the following dictionary could represent an instance of that class:

{
	"name": "Walmart",
	"location": "Venice, CA"
}

But Python doesn't know how to take something like a Store object and turn it into a dictionary.

That's where marshmallow comes in, but we have to tell it what attributes of the object it needs to use in order to construct the dictionary: name and location.

We do that by creating a Schema:

from marshmallow import Schema, fields

class StoreSchema(Schema):
	name = fields.Str()
	location = fields.Str()

Here we've created the StoreSchema class, which inherits from marshmallow's Schema class. It contains two class attributes, name and location. The names are important! The values are important too: fields.Str().

When we use marshmallow to create a dictionary out of a Python object, the result will be a dictionary with two keys: name and location. The values will be strings.

Now, to turn the object into a dictionary we need to do three things:

  1. Import our Store and StoreSchema classes.
  2. Create a StoreSchema object that is used to actually perform serialization.
  3. "Dump" the object through the StoreSchema object with .dump(). That gives us a dictionary.
from store import Store
from schema import StoreSchema

walmart = Store("Walmart", "Venice, CA")
store_schema = StoreSchema()

print(store_schema.dump(walmart))
# {'name': 'Walmart', 'location': 'Venice, CA'}

A typical question at this point is: "why do we need to create the StoreSchema object?" It's because we can pass some configuration options at that point to slightly modify or limit what the schema does[1].

Deserialization with marshmallow and Python

Before deserializing, marshmallow can validate the data to be deserialized.

We can add validation rules so that errors will be raised if the data does not agree with those rules.

At the moment the only validation rules we have are that the fields name and location must be strings, so let's double check that by:

  1. Importing our StoreSchema class.
  2. Getting our store data (which might be in a file, given by our users, or in this case just hard-coded).
  3. Creating our StoreSchema object.
  4. Using .load() to pass the data through the schema for validation.
from schema import StoreSchema

store_data = {"name": "Walmart", "location": "Venice, CA"}
store_schema = StoreSchema()

print(store_schema.load(store_data))
# {'name': 'Walmart', 'location': 'Venice, CA'}

No problem here, because the fields are indeed strings!

If we try this though, we'll get an error:

store_data = {"name": 5, "location": "Venice, CA"}
print(store_schema.load(store_data))

You should get an error like this one:

Traceback (most recent call last):
File "main.py", line 11, in <module>
    print(store_schema.load(store_data))
  File "/home/runner/.local/share/virtualenvs/python3/lib/python3.8/site-packages/marshmallow/schema.py", line 722, in load
    return self._do_load(
  File "/home/runner/.local/share/virtualenvs/python3/lib/python3.8/site-packages/marshmallow/schema.py", line 904, in _do_load
    raise exc
marshmallow.exceptions.ValidationError: {'name': ['Not a valid string.']}

Beautiful, innit! Not a valid string is what it says at the end, which is accurate!

If we wanted to turn our validated dictionary into a Store object, we can do this, passing each key of the dictionary as a named argument to the Store constructor:

from store import Store
from schema import StoreSchema

store_data = {"name": "Walmart", "location": "Venice, CA"}
store_schema = StoreSchema()

store = Store(**store_schema.load(store_data))
print(stores)
# {'name': 'Walmart', 'location': 'Venice, CA'}

By default, when we deserialize, marshmallow only performs validation. It doesn't create an object for us.

But now that we've got the validation out of the way, let's modify our schema slightly so that it does create a Store object when it's done validating. I'll show you how you can do this, but normally I wouldn't do this:

from marshmallow import Schema, fields, post_load
from store import Store


class StoreSchema(Schema):
	name = fields.Str()
	location = fields.Str()

	@post_load
	def make_store(self, data, **kwargs):
        return Store(**data)

We've now added the @post_load decorated method. This runs after the default loading operations conclude (i.e. after validation).

The make_store method receives data: the entire validated dictionary that marshmallow has processed. It also has some other keyword arguments that might be used, and you can find more on that in the official documentation[2].

Now that we've got this schema, it will no longer give us the validated dictionary after loading. It'll validate and immediately execute make_store, which gives us the Store object:

from schema import StoreSchema

store_data = {"name": "Walmart", "location": "Venice, CA"}
store_schema = StoreSchema()

print(store_schema.load(store_data))
# <store.Store instance at 0x7ff2a18c>

We'll see in a moment that having our schema create objects for us can be a blessing, but it can also be a bit limiting at times! I would generally avoid making our schemas return objects, instead doing that in the code that uses the schema.

How to store data into a MongoDB database

If you've never used MongoDB before, this post isn't going to be a complete beginner's guide to MongoDB! You can check the official introduction if you're new to MongoDB.

MongoDB is a non-relational database where we store JSON strings. These JSON strings are searchable, but MongoDB doesn't have the concept of table definitions, so the JSON strings don't all have to have the same structure in one collection.

In MongoDB, tables are called "collections" since the concept of a "column" doesn't really apply when every row can have different columns.

The easiest way to start interacting with MongoDB in Python is to install the pymongo library:

pip install pymongo

Then, we can create a database.py file that will handle the interaction with MongoDB. This class has:

  • An initialize() method that handles creating the MongoDB connection.
  • A save_to_db() method that saves the data parameter to the stores collection in MongoDB.
  • A load_from_db() method that uses the query parameter to find all matching elements in the stores collection.

Note that this is by no means the most perfect way to interact with MongoDB (particularly in larger applications), but the purpose of this blog post is to teach you about marshmallow serialization and deserialization—not MongoDB best practices!

Here's the sample database.py file:

import pymongo

class Database:
	@classmethod
	def initialize(cls):
		client = pymongo.MongoClient("mongodb://localhost:27017/test_db")
		cls.database = client.get_default_database()

	@classmethod
	def save_to_db(cls, data):
		cls.database.stores.insert_one(data)

	@classmethod
	def load_from_db(cls, query):
		return cls.database.stores.find(query)

Let's go and save a dictionary to the database to test this out:

from database import Database
    
Database.initialize()
Database.save_to_db({"name": "Walmart", "location": "Venice, CA"})
    
loaded_objects = Database.load_from_db({"name": "Walmart"})
print(loaded_objects)
# [{'_id': ObjectId('5e7cea2c0d86c32f5a934f92'), name': 'Walmart', 'location': 'Venice, CA'}]

Note how when querying the database, MongoDB will find all elements in the stores collection that have a name of Walmart, and return them. At the moment we only have one, but if you ran the file multiple times you'd see we insert the same store multiple times into MongoDB. The returned list would therefore increase in size each time.

Note that MongoDB is generating the _id field, which is an ObjectId. It is recommended to generate your own ids instead of using the MongoDB defaults. We'll be using uuid for this.

This was a very quick primer of MongoDB. Now let's see how we could use marshmallow with this application. First, we'll change the Schema to have a _id field:

from marshmallow import Schema, fields, post_load
from store import Store


class StoreSchema(Schema):
    _id = fields.Str()
	name = fields.Str()
	location = fields.Str()

	@post_load
	def make_store(self, data, **kwargs):
        return Store(**data)

Then we'll change the model to accept it in the __init__ method. We'll give it a default value so that we'll generate a UUID if one is not passed in:

import uuid


class Store:
    def __init__(self, name: str, location: str, _id: str = None):
        self.name = name
        self.location = location
        self._id = _id or uuid.uuid4().hex

Finally, we can use the two together:

from database import Database
from schema import StoreSchema

store_schema = StoreSchema()

Database.initialize()
Database.save_to_db({"name": "Walmart", "location": "Venice, CA"})

loaded_objects = Database.load_from_db({"name": "Walmart"})

for loaded_store in loaded_objects:
	store = store_schema.load(loaded_store)
    print(store.name)

What we've done here is take the list of dictionaries MongoDB returns, and passed each store through the .load() method of our StoreSchema object. Then, we can access that Store object's properties (or methods, if it had any!).

How to handle user data in this process

Instead of (or as well as) using marshmallow to handle serializing and deserializing from MongoDB, you could use marshmallow to handle user data.

Let's say a user gives you a dictionary as data, for you to save into MongoDB:

  1. First we'll get user input, usually as a string.
  2. We then convert it to a dictionary.
  3. We then pass it through our schema for validation. That gives us an object.
  4. We then use .dump to get back the validated dictionary, and save that to MongoDB.

Here's a perfect example of where we could save a bit of work if our StoreSchema didn't give us Store objects.

import json
from database import Database
from schema import StoreSchema

store_schema = StoreSchema()

Database.initialize()

user_input = input("Enter a store dictionary: ")
user_dict = json.loads(user_input)
user_object = store_schema.load(user_dict)

Database.save_to_db(store_schema.dump(user_object))

loaded_objects = Database.load_from_db({"name": "Walmart"})

for loaded_store in loaded_objects:
	store = store_schema.load(loaded_store)
    print(store.name)

Below, the code if the schema didn't give us Store objects:

import json
from database import Database
from schema import StoreSchema

store_schema = StoreSchema()

Database.initialize()

user_input = input("Enter a store dictionary: ")
user_dict = json.loads(user_input)
validated_dict = store_schema.load(user_dict)

Database.save_to_db(validated_dict)

loaded_objects = Database.load_from_db({"name": "Walmart"})

for loaded_store in loaded_objects:
    store = Store(**store_schema.load(loaded_store))
    print(store.name)

The benefit of this is now we can use the schema to load the user's data and the database's data, in case it has been changed later on by another part of the application.

Wrapping Up

In this blog post we've learned about how to use Marshmallow to serialize and deserialize data, and how we can use that with MongoDB.

I hope this post has been useful, and thanks for reading!


  1. Filtering output (Marshmallow Official Documentation) ↩︎

  2. marshmallow.decorators.post_load (Marshmallow Official Documentation) ↩︎