Job Vacancy: Python Developer – Oxfordshire, UK

As of December, 2018, a Python Developer is required for a full-time office-based role in Witney, Oxfordshire. The role pertains to logistics management software and requires commercial Python experience and knowledge of web applications. Enquire via message:



Molecular visualization with 3DMol.js

3DMol.js is an intuitive and highly flexible library for visualizing molecular structures within scientific applications. The simplest way to display and style protein or chemical structures is through a URL with the appropriate query string, or by embedding 3Dmol parameters within an HTML tag:

The data-cid or data-pdb attributes specify the desired molecule and these identifiers are available at PubChem and Protein Data Bank. The above code will render an interactive viewer containing the following:

Supplying data-* attributes allows styling to be customized. For example, to highlight all tryptophan amino acid residues on chain B, or to display the structure in ‘cartoon’ style:

The API gives greater control over
how 3Dmol should interact with other events in the browser as well as greater flexibility over styling. The following script is based upon the extensive documentation it provides:, pertains to functional molecules, e.g. pharmaceuticals. It allows comparisons of 3D chemical structures, through 3Dmol.js, as well as allowing users to obtain commercial sources and relevant scientific literature.

Rapid prototyping of APIs using Marshmallow

Marshmallow is a lightweight object serialization/de-serialization library, used for converting complex objects to and from native Python datatypes. By defining a marshmallow schema, app-level objects can be serialized to native python datatypes (before rendering to JSON), and input data validated and de-serialized to app-level objects.

Integrating Marshmallow with Flask allows APIs to be created quickly, while the dependencies are relatively light and it is easy to see what the code is doing. When building a prototype API, the aims will usually be to allow basic HTTP methods and CRUD operations, and to serve data in JSON format. The code should have the flexibility to allow it to easily be modified. This is largely because at the beginning we may not know exactly what is best needed.

A basic API could be created using Flask on it own, and an ORM e.g. SQLAlchemy to support a database. This is as in the following example of an endpoint which provides a specified number of recent air quality measurements from a site within a collection of monitoring sites:

from flask import Flask, jsonify
from flask_sqlalchemy import SQLAlchemy
from my_app.helpers import convert_to_dict

app = Flask(__name__)
db = SQLAlchemy()

class Site(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100), unique=True)
    site_code = db.Column(db.String(10), unique=True)
    hourly_data = db.relationship('HourlyData', 
                      backref='owner', lazy='dynamic')

class HourlyData(DataMixin, db.Model):
    id = db.Column(db.Integer, primary_key=True)
    ozone = db.Column(db.String(10))
    no2 = db.Column(db.String(10))
    pm10 = db.Column(db.String(10))
    time = db.Column(db.String(20))
    site_id = db.Column(db.Integer, 

def site_aq_values(site_code, days):
    qs = HourlyData.query.join(Site).filter(
             Site.site_code == site_code.upper()).order_by(
    if qs:
        data_keys = ['o3', 'no2', 'so2', 'pm10']
        data_list = [{'time': a.time, 'values': 
        {b: getattr(a, b) for a in qs 
         for b in data_keys}} for a in qs]
        site_keys = ['name', 'region', ...]
        site_data =  {b: getattr(a, b) for a in qs for b                                  
                      in site_keys}} for a in qs
        all_data = {site_code.upper(): 
                        'aq_data': data_list,
                        'site_data: site_data}
        return jsonify(all_data)
    return jsonify({'message': 'no data'})

Clearly, manually converting models into dictionaries for each route would involve a lot of repetition, the code would be poorly readable, and it would take a long time to write. Marshmallow is ORM/framework agnostic and works with e.g. complex python dictionary structures as well as with various ORM objects. The SQLAlchemy model query in the example above can be serialized using to return the same response using Marshmallow:

from flask_marshmallow import Marshmallow
from app.models import HourlyData, Site

ma = Marshmallow()

class SiteSchema(ma.Schema):
    id = ma.Integer(dump_only=True)
    data = ma.Nested(HourlyDataSchema, 
                     allow_none=True, dump_only=True)
    url = ma.URLFor('Site.site_detail', id='<id>')
    user = ma.Function(lambda obj:

    class Meta:
        additional = ('name', 'site_code', 'region', ...)
class HourlyDataSchema(ma.Schema):
    id = ma.Integer(dump_only=True)
    site_name = ma.Function(lambda obj:
    site_code = ma.Function(lambda obj: obj.owner.site_code)
    time = ma.Method(serialize='format_time')

    def format_time(self, obj):
        return '{}-{}-{} {}'.format(
            *obj.time.split(' ')[0].split('/')[::-1], 
            obj.time.split(' ')[1])

    class Meta: 
        additional = ('ozone', 'no2', 'so2', 'pm10') 
many_data_schema = HourlyDataSchema(many=True)

The schema can then be re-used across multiple views:

from app.schemas import current_hour_schema, many_data_schema, data_schema, hourlydata_schema, site_schema

hourly_data = Blueprint('hourly_data', __name__, url_prefix='/data')

def recent_aq():
    data = HourlyData.query.group_by(HourlyData.site_id)
    return current_hour_schema.jsonify(data)

def get_aq_data(site_code, number):
    data = HourlyData.query.join(Site).filter(
               Site.site_code == site_code.upper()).order_by(                                   
    return jsonify({
    'site info': site_schema.dump(data[0].owner), 
    'aq data': many_data_schema.dump(data)

Python unit testing with Mock

Unit testing is used to check that a certain unit of code behaves as expected. This unit should have a narrow, well-defined scope and it is important that the units are tested in isolation, such as by stubbing or mocking interactions to the outside world. By testing individual units in isolation from external code they depend upon, failures in the code base can be more easily identified. To avoid individual tests breaking unnecessarily, this concept of keeping unit tests decoupled extends to e.g. assigning expected return values to the values of object attributes instead of ‘hard-coded’ values.

In the following example, the unittest.mock library allows a function to be tested in isolation from a helper function which appends a string with the date in a custom format. Here, the call to the helper function is mocked using the convenient patch decorator:

from datetime import datetime

def suffix(day):
    if day in ['11', '12', '13']:
        return 'th'
    return {1: 'st', 2: 'nd', 3: 'rd'}.get(day, 'th')

def welcome_msg(greet):
    dt =
    day = str( + suffix(
    today = dt.strftime('{d} %B %Y').replace('{d}', day)
    return '{}. Today is {}'.format(greet, today)

from datetime import datetime
import unittest
from unittest.mock import patch
from views.login_view import welcome_msg

class LoginViewTestCase(unittest.TestCase):

    def test_greeting(self, suffix_patch):
        expected = '{}. Today is {}{} {}'.format(
            'Hello',, 'th', 
  '%B %Y'))
        self.assertRegex(expected, welcome_msg('Hello'))

In doing so, the mocked function has been replaced with a Mock object which was created by applying the decorator. When it is called, a Mock object will return its return_value attribute, which can easily be set but by default is a new Mock object.

It is desirable in many cases to test if and how many times a mocked callable is called. The boolean and integers values provided by call and call_count are useful for this:

mock = Mock(return_value=None)
a - mock.called
b = mock.called
c = mock.call_count

>>> print(a, b, c)
False True 2

side_effect can be set and this is useful for raising exceptions in order to test error handling:

from django.http import Http404

@patch('views.login_view.requests.get', side_effect=Http404)
def test_my_func_raises_http_exception(self, my_patch):
    with self.assertRaises(Http404):

It is also useful where your mock is going to be called several times, and you want each call to return a different value:

def adder(val):
    return val + 5

def adder_squared():
    return adder(a) ** 2

@patch('adder', side_effect=[1, 2])
def test_repeat_caller(self, test_patch):
    resp = adder_squared()
    self.assertEqual(resp, 1)
    resp2 = adder_squared()
    self.assertEqual(resp2, 4)

Lazy evaluation of Django ORM objects

When you create a Django QuerySet object, no database activity occurs until you do something to evaluate the queryset. Evaluation is forced by the following: iterating over it,  calling len() or list(), slicing it with the ‘step’ parameter, or testing it in a boolean context.

How data is held in memory

When a queryset is created the cache is empty. When it is evaluated and database interaction occurs, the results of the query are stored and the requested results are returned. In many cases the queryset should be stored and re-used instead of consuming, in order to avoid unnecessary database lookups:

entry = Entry.objects.get(id=1)  # Blog object is retried  # cached version, no DB lookup

entry.authors.all()    #query performed
entry.authors.all()    #query performed again

As caching objects can involve significant memory usage, if the queryset will not need to be re-used sometimes then there is no need for it to be cached. As well as caching of querysets, there is also caching of attributes of ORM objects. In general, attributes of ORM objects that are not callable will be cached whereas callable attributes cause DB lookups every time.

Retrieve everything you need in one hit

But not the things you don’t need. Using QuerySet.values() can signficantly reduce the overhead from a database lookup and is useful where when you just need a dictionary or a list of the values, not the ORM model objects. QuerySet.select_related() is useful for lookups spanning multiple tables:

class Album(models.Model):
    title = models.CharField(max_length=50)
    year = models.IntegerField()

    name = models.CharField(max_length=50)
    album = models.ForeignKey(Album)

song = Song.objects.get(id=5) # query performed
album =  song.album # query performed again

song = Song.objects.select_related(‘album’).get(id=5)
song.album # database query not required

QuerySet.select_related() works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to FK and one-to-one relationships.

QuerySet.prefetch_related() serves a similar purpose, but the strategy is quite different. It does a separate lookup for each relationship, and does the “joining” in Python.  This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related.

Plotting live data using Highcharts and a REST API

Highcharts is a JavaScript charting framework, similar to D3.js, plotly.js and Google Charts. It enables the creation of various types of interactive charts which can easily be integrated on a web site.

The King’s College London API provides live air quality data for sites across London. This REST API exposes data from the database in either JSON or XML. Calling the API returns data in JSON format (as opposed to HTML), allowing the data to be directly used in Python. The following chart was created using this API together with HighCharts and Flask.

Flask is used since HighCharts is written in HTML5/JavaScript and therefore requires a web browser.   The code for this web app is contained within this GitHub repository:

Within the file in the views directory, the get_json function returns a dictionary of air quality monitoring data requested from the London Air API. The function takes in values which specify the site and number of previous days data the user is interested in. String formatting is then used to generate the desired endpoint as a string which is passed to the requests get method.

Before the requests library was released, sending HTTP requests relied upon the verbose and cumbersome urllib2 library. The requests library greatly reduces the lines of code needed and is well suited to making RESTful API calls. The get method requires a URL as an argument and allows you to pass optional parameters such as http request headers (e.g. login credentials). Requests built-in JSON decoder, called by request.json(), converts the JSON response into a Python dictionary, which in this case contains many layers of nesting.

The get_data function uses list comprehensions to create lists of pollutant values and the hours (for the x and y axes). To avoid any KeyErrors, empty strings are returned instead of None for missing data points. The get_data function passes a dictionary of these lists to the make_chart function which has a decorator specifying the url. By providing ‘detail.html’ as a positional argument, Flasks render_template method passes the key-values pairs required by HighCharts in order create the desired chart. This html template containing the HighCharts JavaScript code is contained within the templates directory.

Adjusting Mouse Scroll Speed in Mac OS Sierra

For many people, the default range of mouse scrolling speeds offered in the Mac OS System Preferences tab does not offer fast enough tracking speeds, especially on larger screens. In previous releases of Mac OS X, 3rd party preference tabs could be used to allow faster speeds and other formsof customisation (including mouse acceleration). Unfortunately, with the release of Mac Os Sierra, several of the internal calls used by these 3rd party utilities were removed, disabling their functionality.

There is still a way to enable faster mouse scroll speeds in Mac OS Sierra though, as detailed in this Apple Support article.

The new process is as follows:

  1. From the Apple menu, choose System Preferences.
  2. From the System Preferences window, select Accessibility.
  3. In the left sidebar, select Mouse & Trackpad.
    System Preferences
  4. Click the Mouse Options button.
  5. In the sheet that appears, use the slider to adjust the mouse scrolling speed, including acceleration.
    Scrolling Speed in System Preferences


The “Mouse & Trackpad” accessability panel sheet gives quite a few good options, although still not as many as some of the 3rd party preference utilitities did.

Dictionaries in Python

Python’s way of storing key-value pairs, a fundamental data structure in computer science. The data type is summarized in the official documentation as “an unordered set of key: value pairs, with the requirement that the keys are unique”. Dictionaries can be indexed by any immutable data type and the stored values accessed in the following ways:

value = d.get[key]

value = d.get(key)

value = d.get(key, "no data")

Whereas using [key] will return a KeyError if the key does not exist, the .get method will either return None, or a default value if specified as an optional parameter. Values within nested dictionaries, such as deserialized JSON data, can be accessed by the successive use of [key] or .get(key):

sales = {'data':{'orders':{'january':240}}}




The following are all valid ways of creating dictionaries:

my_dict = {'key1': 'value1', 'key2': 'value2'}

my_dict = dict(key1='value1',key2='value2')

my_dict = {x: x**2 for x in values}

my_dict = dict(zip(keys, values))

When the keys are simple strings, it can be useful to pass in the keys as keywords to the dict() constructor. This is the most performant way of creating dictionaries and useful for the generation of arbitrary keys and values. Using the zip function inside the dict() constructor is particularly useful for creating dictionaries from lists of keys and values.

Dictionaries are unordered, except in Python 3.6+. To store the insertion order of keys, the dictionary sub-class OrderedDict can be used after importing it from the collections module in the standard library.

Data visualization libraries for Python

Matplotlib and pandas (a library built on top of NumPy) are a powerful combination for processing and plotting data. The default plotting styles of matplotlib are somewhat basic, but with recent versions the aesthetics can be improved using the style sub-package. A list of available styles can be obtained using the style.available attribute:

from matplotlib import pyplot as plt, style
>>> print (
['seaborn-deep', 'seaborn-dark', 'fivethirtyeight', 'dark_background', 'seaborn-colorblind', 'seaborn-bright', 'seaborn-notebook', 'seaborn-whitegrid', 'seaborn-dark-palette', 'seaborn-ticks', 'seaborn-pastel', 'seaborn-poster', 'classic', 'seaborn-white', 'grayscale', 'seaborn-paper', 'seaborn-muted', 'seaborn-talk', 'ggplot', 'seaborn-darkgrid', 'bmh']

Then just call style.use() within the code used to generate a plot:‘seaborn-white’)

Seaborn is a library built on top of matplotlib. It provides various useful plotting functions and the plots it produces tend to be visually attractive. Seaborn is especially useful for exploring statistical data and for use with more complex data sets.

The choice of library should largely depend upon the desired visualization. Matplotlib on its own is very powerful and should be used for simple bar, line, pie, scatter plots etc. More complicated plots will require significantly more lines of code and seaborn will usually be more appropriate in these cases.

Bokeh was created with the aim of providing attractive and interactive plots in the style of the JavaScript D3.js library. Since Bokeh is higher level than D3.js, interactive visualizations can generally be created with much less effort. The documentation is fairly comprehensive, however the library is still under heavy development so may best be avoided if future compatibility is a potential issue.

Avoiding multi-table inheritance in Django Models

Model inheritance does not have a natural translation to relational database architecture and so models in Django should be designed in order to avoid impact on database performance. When there is no need for the base model to be translated into a table abstract inheritance should be used instead of multi-table inheritance.

Given the following model:

class Person(Model):
  name = CharField()

class Employee(Person):
  department = CharField()

Two tables will be created and what looks like a simple query to the Employee child class will actually involve a join automatically being created. The same example with abstract = True in the Meta class allows abstract inheritance:

class Person(Model):
  name = CharField()

class Meta:
  abstract = True

class Employee(Person):
  department = CharField()

By putting abstract = True, the extra table for the base model is not created and the fields within the base model are automatically created for each child model. This avoids unnecessary joins being created to access those fields. This way of using model inheritance also avoids repetition of code within the child classes.