The clld database models are declared using SQLAlchemy’s declarative extension. In particular we follow the approach of mixins and custom base class, to provide building blocks with enough shared commonality for custom data models.

Declarative base and mixins

class clld.db.meta.Base(jsondata=None, **kwargs)[source]

The declarative base for all our models.

classmethod get(value, key=None, default=<NoDefault>, session=None)[source]

Convenience method to query a model where exactly one result is expected.

e.g. to retrieve an instance by primary key or id.

  • value – The value used in the filter expression of the query.
  • key (str) – The key or attribute name to be used in the filter expression. If None is passed, defaults to pk if value is int otherwise to id.

return result proxy to iterate over previous versions of a record.

jsondata = Column(None, JSONEncodedDict(), table=None)

To allow storage of arbitrary key,value pairs with typed values, each model provides a column to store JSON encoded dicts.


Deprecated convenience function.

Use jsondata directly instead, which is guaranteed to be a dictionary.

pk = Column(None, Integer(), table=None, primary_key=True, nullable=False)

All our models have an integer primary key which has nothing to do with the kind of data stored in a table. ‘Natural’ candidates for primary keys should be marked with unique constraints instead. This adds flexibility when it comes to database changes.


Convenience function.

Since we use the simple JSON encoded dict recipe without mutation tracking, we provide a convenience method to update

class clld.db.meta.CustomModelMixin[source]

Mixin for customized classes in our joined table inheritance scheme.


With this scheme there can be only one specialized mapper class per inheritable base class.

class clld.db.models.common.IdNameDescriptionMixin[source]

Mixin for ‘visible’ objects, i.e. anything that has to be displayed.

In particular all Resources fall into this category.


Only one of clld.db.models.common.IdNameDescriptionMixin.description or clld.db.models.common.IdNameDescriptionMixin.markup_description should be supplied, since these are used mutually exclusively.

description = Column(None, Unicode(), table=None)

A description of the object.

id = Column(None, String(), table=None)

A str identifier of an object which can be used for sorting and as part of a URL path; thus should be limited to characters valid in URLs, and should not contain ‘.’ or ‘/’ since this may trip up route matching.

markup_description = Column(None, Unicode(), table=None)

A description of the object containing HTML markup.

name = Column(None, Unicode(), table=None)

A human readable ‘identifier’ of the object.

While the above mixin only adds columns to a model, the following mixins do also add relations between models, thus have to be used in combination, tied together by naming conventions.

class clld.db.models.common.DataMixin[source]

Provide a simple way to attach key-value pairs to a model class given by name.

class clld.db.models.common.HasDataMixin[source]

Adds a convenience method to retrieve the key-value pairs from data as dict.


It is the responsibility of the programmer to make sure conversion to a dict makes sense, i.e. the keys in data are actually unique, thus usable as dictionary keys.


return dict of associated key-value pairs.

class clld.db.models.common.FilesMixin[source]

This mixin provides a way to associate files with instances of another model class.


The file itself is not stored in the database but must be created in the filesystem, e.g. using the create method.

create(dir_, content)[source]

Write content to a file using dir_ as file-system directory.

Returns:File-system path of the file that was created.
mime_type = Column(None, String(), table=None)

Mime-type of the file content.

ord = Column(None, Integer(), table=None, default=ColumnDefault(1))

Ordinal to control sorting of files associated with one db object.


OS file path of the file relative to the application’s file-system dir.

class clld.db.models.common.HasFilesMixin[source]

Mixin for model classes which may have associated files.


return dict of associated files keyed by id.

Typical usage looks like

class MyModel_data(Base, Versioned, DataMixin):

class MyModel_files(Base, Versioned, FilesMixin):

class MyModel(Base, HasDataMixin, HasFilesMixin):

Core models

The CLLD data model includes the following entities commonly found in linguistic databases and publications:

class clld.db.models.common.Dataset(jsondata=None, **kwargs)[source]

Represents a database.

Each project (e.g. WALS, APiCS) is regarded as one dataset; thus, each app will have exactly one Dataset object.


primary key


date of publication




place of publication

class clld.db.models.common.Language(jsondata=None, **kwargs)[source]

Languages are the main objects of discourse.

We attach a geo-coordinate to them to be able to put them on maps.


geographical latitude in WGS84


geographical longitude in WGS84


primary key

class clld.db.models.common.Parameter(jsondata=None, **kwargs)[source]

A measurable attribute of a language.


primary key

class clld.db.models.common.ValueSet(jsondata=None, **kwargs)[source]

The intersection of Language and Parameter.


primary key


textual description of the source for the valueset

class clld.db.models.common.Value(jsondata=None, **kwargs)[source]

A measurement of a parameter for a particular language.


textual assessment of the reliability of the value assignment


Languages may have multiple values for the same parameter. Their relative frequency can be stored here.

class clld.db.models.common.Contribution(jsondata=None, **kwargs)[source]

A set of data contributed within the same context by the same contributors.


primary key

class clld.db.models.common.Contributor(jsondata=None, **kwargs)[source]

Creator of a contribution.


primary key

class clld.db.models.common.Source(jsondata=None, **kwargs)[source]

A bibliographic record, cited as source for some statement.


primary key

class clld.db.models.common.Unit(jsondata=None, **kwargs)[source]

A linguistic unit of a language.


primary key

class clld.db.models.common.UnitParameter(jsondata=None, **kwargs)[source]

A measurable attribute of a unit.


primary key

class clld.db.models.common.UnitValue(jsondata=None, **kwargs)[source]

primary key

validate_parameter_pk(key, unitparameter_pk)[source]

Validator to sync related parameter.

We have to make sure, the parameter a value is tied to and the parameter a possible domainelement is tied to stay in sync.


Versioned model objects are supported via the clld.db.versioned.Versioned mixin, implemented following the corresponding SQLAlchemy ORM Example.


Migrations provide a mechanism to update the database model (or the data) in a controlled and repeatable way. clld apps use Alembic to implement migrations.

Since a migration may change the database schema, it is generally not possible to fully use ORM mechanisms in migration scripts. Instead, migration scripts typically construct SQL to be sent to the database “by hand”, or using SQLAlchemy’s SQL expression language. Now dropping down to these lower levels of database access makes scripts verbose and error prone. Thus, clld provides a module with helpers for Alembic migration scripts.

Functionality for alembic scripts.

This module provides

  • basic crud functionality within alembic migration scripts,
  • advanced helpers for special tasks, like merging sources.


Using the functionality provided in this module is not possible for Alembic scripts supposed to be run in offline mode.

class clld.db.migration.Connection(conn)[source]

A wrapper around an SQLAlchemy connection.

This wrapper provides the convenience of allowing typical CRUD operations to be called passing model classes.

Additionally, it implements more complicated clld domain specific database operations.

A Connection will typically be instantiated in an Alembic migration script as follows:

from alembic import op
conn = Connection(op.get_bind())
all(model, **where)[source]

return all results of a select statement.

delete(model, **where)[source]

Run a delete statement.

execute(*args, **kw)[source]

Provide access to the underlying connection’s execute method.

first(model, **where)[source]

return first result of a select statement or None.

get(model, pk)[source]

return row specified by primary key.

insert(model, **values)[source]

Run an insert statement.

Returns:primary key of the inserted row.
pk(model, id_, attr='id')[source]

Get the primary key of an object specified by a unique property.

  • model – model class.
  • id – Value to be used when filtering.
  • attr – Column to be used for filtering.

primary key of (first) matching row.

select(model, **where)[source]

Run a select statement and return a ResultProxy.

set_glottocode(lid, gc, gcid=None)[source]

assign a unique glottocode to a language.

i.e. alternative glottocodes will be deleted.

  • lidid of the language.
  • gc – Glottocode to be assigned.
  • gcidid of the Identifier instance if one has to be created; defaults to gc.
update(model, values, **where)[source]

Run an update statement.