Plotting live data using Highcharts and a REST API

Highcharts is a JavaScript charting framework, similar to D3.js, plotly.js and Google Charts. It enables the creation of various types of interactive charts which can easily be integrated on a web site.

The King’s College London API provides live air quality data for sites across London. This REST API exposes data from the database in either JSON or XML. Calling the API returns data in JSON format (as opposed to HTML), allowing the data to be directly used in Python. The following chart was created using this API together with HighCharts and Flask.

Flask is used since HighCharts is written in HTML5/JavaScript and therefore requires a web browser.   The code for this web app is contained within this GitHub repository:  https://github.com/paulos84/airapp3

Within the charts.py file in the views directory, the get_json function returns a dictionary of air quality monitoring data requested from the London Air API. The function takes in values which specify the site and number of previous days data the user is interested in. String formatting is then used to generate the desired endpoint as a string which is passed to the requests get method.

Before the requests library was released, sending HTTP requests relied upon the verbose and cumbersome urllib2 library. The requests library greatly reduces the lines of code needed and is well suited to making RESTful API calls. The get method requires a URL as an argument and allows you to pass optional parameters such as http request headers (e.g. login credentials). Requests built-in JSON decoder, called by request.json(), converts the JSON response into a Python dictionary, which in this case contains many layers of nesting.

The get_data function uses list comprehensions to create lists of pollutant values and the hours (for the x and y axes). To avoid any KeyErrors, empty strings are returned instead of None for missing data points. The get_data function passes a dictionary of these lists to the make_chart function which has a decorator specifying the url. By providing ‘detail.html’ as a positional argument, Flasks render_template method passes the key-values pairs required by HighCharts in order create the desired chart. This html template containing the HighCharts JavaScript code is contained within the templates directory.

Advertisements

Dictionaries in Python

Python’s way of storing key-value pairs, a fundamental data structure in computer science. The data type is summarized in the official documentation as “an unordered set of key: value pairs, with the requirement that the keys are unique”. Dictionaries can be indexed by any immutable data type and the stored values accessed in the following ways:

value = d.get[key]

value = d.get(key)

value = d.get(key, "no data")

Whereas using [key] will return a KeyError if the key does not exist, the .get method will either return None, or a default value if specified as an optional parameter. Values within nested dictionaries, such as deserialized JSON data, can be accessed by the successive use of [key] or .get(key):

sales = {'data':{'orders':{'january':240}}}

sales['data']['orders']['january']

sales.get('data').get('orders').get('january')

sales['data']['orders'].get('january')

The following are all valid ways of creating dictionaries:

my_dict = {'key1': 'value1', 'key2': 'value2'}

my_dict = dict(key1='value1',key2='value2')

my_dict = {x: x**2 for x in values}

my_dict = dict(zip(keys, values))

When the keys are simple strings, it can be useful to pass in the keys as keywords to the dict() constructor. This is the most performant way of creating dictionaries and useful for the generation of arbitrary keys and values. Using the zip function inside the dict() constructor is particularly useful for creating dictionaries from lists of keys and values.

Dictionaries are unordered, except in Python 3.6+. To store the insertion order of keys, the dictionary sub-class OrderedDict can be used after importing it from the collections module in the standard library.

Data visualization libraries for Python

Matplotlib and pandas (a library built on top of NumPy) are a powerful combination for processing and plotting data. The default plotting styles of matplotlib are somewhat basic, but with recent versions the aesthetics can be improved using the style sub-package. A list of available styles can be obtained using the style.available attribute:

from matplotlib import pyplot as plt, style
>>> print (plt.style.available)
['seaborn-deep', 'seaborn-dark', 'fivethirtyeight', 'dark_background', 'seaborn-colorblind', 'seaborn-bright', 'seaborn-notebook', 'seaborn-whitegrid', 'seaborn-dark-palette', 'seaborn-ticks', 'seaborn-pastel', 'seaborn-poster', 'classic', 'seaborn-white', 'grayscale', 'seaborn-paper', 'seaborn-muted', 'seaborn-talk', 'ggplot', 'seaborn-darkgrid', 'bmh']

Then just call style.use() within the code used to generate a plot:

plt.style.use(‘seaborn-white’)

Seaborn is a library built on top of matplotlib. It provides various useful plotting functions and the plots it produces tend to be visually attractive. Seaborn is especially useful for exploring statistical data and for use with more complex data sets.

The choice of library should largely depend upon the desired visualization. Matplotlib on its own is very powerful and should be used for simple bar, line, pie, scatter plots etc. More complicated plots will require significantly more lines of code and seaborn will usually be more appropriate in these cases.

Bokeh was created with the aim of providing attractive and interactive plots in the style of the JavaScript D3.js library. Since Bokeh is higher level than D3.js, interactive visualizations can generally be created with much less effort. The documentation is fairly comprehensive, however the library is still under heavy development so may best be avoided if future compatibility is a potential issue.

Avoiding multi-table inheritance in Django Models

Model inheritance does not have a natural translation to relational database architecture and so models in Django should be designed in order to avoid impact on database performance. When there is no need for the base model to be translated into a table abstract inheritance should be used instead of multi-table inheritance.

Given the following model:

class Person(Model):
  name = CharField()
  …

class Employee(Person):
  department = CharField()
  …

Two tables will be created and what looks like a simple query to the Employee child class will actually involve a join automatically being created. The same example with abstract = True in the Meta class allows abstract inheritance:

class Person(Model):
  name = CharField()
  …

class Meta:
  abstract = True

class Employee(Person):
  department = CharField()
  …

By putting abstract = True, the extra table for the base model is not created and the fields within the base model are automatically created for each child model. This avoids unnecessary joins being created to access those fields. This way of using model inheritance also avoids repetition of code within the child classes.

PyCharm Auto-import works differently to PHPStorm and IntelliJ

Having become used to developing in JetBrain’s PHPStorm & IntellliJ IDEs it nows seems tedious to break out of the programming flow to manually type out imports every time we introduce a new dependency.

However in that company’s Python IDE, PyCharm, auto-complete works differently. The:

  [ctrl] + [space]

keyboard shortcut still auto-suggests but doesn’t include non-imported classes, but the

  [ctrl] + [alt] + [space]

keyboard shortcut does! Displaying all available classes and auto-generating the import statement for you, just like in JetBrains other IDEs.

Quickly get memcached working in Python Django

As with most frameworks, the Django framework for Python can make use of caching to greatly improve performance for many common requests. Here we will look at using memcached as it enjoys good Django support and production use although there is also Redis support which definitely improves on memcached in some aspects such as data persistence.

  1. The first step is to install memcached on your server:
  2. RedHat Linux:

    yum install memcached

    Ubuntu / Debian Linux:

    apt-get install memcached
  3. Let Django know how to access memcached:
  4. In Django’s settings.py file, add the following line:

    'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache'
  5. Load the cache within your application
  6. from django.core.cache import cache
  7. Save the value to the cache
  8. cache.set('exampleValue',exampleValue)
  9. Retrieve the value from the cache
  10. exampleValue = cache.get('exampleValue')

The beauty being that exampleValue can be anything from a computed / database retrieved value to large blocks of static text or a URL etc.

The only problem with caches is they don’t always contain the data you expect, what if the value got flushed or hasn’t yet been stored? Lets rewrite step 5 to handle the event of the value not being available in the cache:

exampleValue = cache.get('exampleValue')
if not exampleValue:
     exampleValue = exampleValueLookup
     cache.set('exampleValue',exampleValue)

Here we see the value exampleValue being retrieved with a backup regeneration if the value has not been set. In a real application this would usually be encapsulated in a getExampleValue function or somewhere appropriate.