Rapid prototyping of APIs using Marshmallow

Marshmallow is a lightweight object serialization/de-serialization library, used for converting complex objects to and from native Python datatypes. By defining a marshmallow schema, app-level objects can be serialized to native python datatypes (before rendering to JSON), and input data validated and de-serialized to app-level objects.

Integrating Marshmallow with Flask allows APIs to be created quickly, while the dependencies are relatively light and it is easy to see what the code is doing. When building a prototype API, the aims will usually be to allow basic HTTP methods and CRUD operations, and to serve data in JSON format. The code should have the flexibility to allow it to easily be modified. This is largely because at the beginning we may not know exactly what is best needed.

A basic API could be created using Flask on it own, and an ORM e.g. SQLAlchemy to support a database. This is as in the following example of an endpoint which provides a specified number of recent air quality measurements from a site within a collection of monitoring sites:

from flask import Flask, jsonify
from flask_sqlalchemy import SQLAlchemy
from my_app.helpers import convert_to_dict

app = Flask(__name__)
db = SQLAlchemy()

class Site(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100), unique=True)
    site_code = db.Column(db.String(10), unique=True)
    hourly_data = db.relationship('HourlyData', 
                      backref='owner', lazy='dynamic')

class HourlyData(DataMixin, db.Model):
    id = db.Column(db.Integer, primary_key=True)
    ozone = db.Column(db.String(10))
    no2 = db.Column(db.String(10))
    pm10 = db.Column(db.String(10))
    time = db.Column(db.String(20))
    site_id = db.Column(db.Integer, 

def site_aq_values(site_code, days):
    qs = HourlyData.query.join(Site).filter(
             Site.site_code == site_code.upper()).order_by(
    if qs:
        data_keys = ['o3', 'no2', 'so2', 'pm10']
        data_list = [{'time': a.time, 'values': 
        {b: getattr(a, b) for a in qs 
         for b in data_keys}} for a in qs]
        site_keys = ['name', 'region', ...]
        site_data =  {b: getattr(a, b) for a in qs for b                                  
                      in site_keys}} for a in qs
        all_data = {site_code.upper(): 
                        'aq_data': data_list,
                        'site_data: site_data}
        return jsonify(all_data)
    return jsonify({'message': 'no data'})

Clearly, manually converting models into dictionaries for each route would involve a lot of repetition, the code would be poorly readable, and it would take a long time to write. Marshmallow is ORM/framework agnostic and works with e.g. complex python dictionary structures as well as with various ORM objects. The SQLAlchemy model query in the example above can be serialized using to return the same response using Marshmallow:

from flask_marshmallow import Marshmallow
from app.models import HourlyData, Site

ma = Marshmallow()

class SiteSchema(ma.Schema):
    id = ma.Integer(dump_only=True)
    data = ma.Nested(HourlyDataSchema, 
                     allow_none=True, dump_only=True)
    url = ma.URLFor('Site.site_detail', id='<id>')
    user = ma.Function(lambda obj: obj.user.name)

    class Meta:
        additional = ('name', 'site_code', 'region', ...)
class HourlyDataSchema(ma.Schema):
    id = ma.Integer(dump_only=True)
    site_name = ma.Function(lambda obj: obj.owner.name)
    site_code = ma.Function(lambda obj: obj.owner.site_code)
    time = ma.Method(serialize='format_time')

    def format_time(self, obj):
        return '{}-{}-{} {}'.format(
            *obj.time.split(' ')[0].split('/')[::-1], 
            obj.time.split(' ')[1])

    class Meta: 
        additional = ('ozone', 'no2', 'so2', 'pm10') 
many_data_schema = HourlyDataSchema(many=True)

The schema can then be re-used across multiple views:

from app.schemas import current_hour_schema, many_data_schema, data_schema, hourlydata_schema, site_schema

hourly_data = Blueprint('hourly_data', __name__, url_prefix='/data')

def recent_aq():
    data = HourlyData.query.group_by(HourlyData.site_id)
    return current_hour_schema.jsonify(data)

def get_aq_data(site_code, number):
    data = HourlyData.query.join(Site).filter(
               Site.site_code == site_code.upper()).order_by(
    return jsonify({
    'site info': site_schema.dump(data[0].owner), 
    'aq data': many_data_schema.dump(data)

Leave a Reply