Data modeling¶
This chapter describes how to model cross-linguistic data using the core resources
available in the clld framework. While it is possible to extend the core data model
in various ways, sticking to core resources for comparable concepts will ensure
re-usability of the data, because all of the data publication mechanisms implemented
in clld will be available.
Dataset¶
Each clld app is assumed to serve a cross-linguistic dataset. The
clld.db.models.common.Dataset object holds metadata about the dataset, e.g.
the publisher and license and relations to editors.
Languages¶
Languages are the core objects which are described in datasets served by clld apps.
clld.db.models.common.Language - like most other objects - are at the most
basic level described by a name, an optional description and an optional geographical
coordinate.
To allow identification of languages across apps or even domains, languages can be
associated with any number of alternative
clld.db.models.common.Identifier; typically glottocodes or iso 639-3
codes or alternative names.
Parameters¶
clld.db.models.common.Parameter objects are used to model language parameters,
i.e. phenomena (aka features) which can be measured across languages. Single datapoints,
i.e. measurements of the parameter for a single language are modeled as instances of
clld.db.models.common.Value. To support multiple measurements for the same
(language, parameter) pair, values are grouped in a
clld.db.models.common.ValueSet, and it is the valueset that is related to
language and parameter.
Enumerated domain¶
clld supports enumerated domains. Elements of the domain of a parameter can be modeled
as clld.db.models.common.DomainElement instances and each value must then be
related to one domain element.
The clld framework will then use the domain property of a parameter to select
behaviour suitable for enumerated domains only, e.g. loading values associated with one
domain element as separate layer when displaying a parameter map.
Typed values¶
The clld framework is agnostic with regard to the types of values, i.e. as far as
default functionality is concerned the only properties required of a value are a name
and an id (and optionally a description). To simply store typed data for values
multiple mechanisms are available.
Storing typed data in the
jsondatadictionary: This accomodates all data types which can be serialized as JSON, i.e. numbers, booleans, arrays, dictionaries.If the data for a value comes as a list or dictionary of strings, it can also be stored as
clld.db.models.common.Value_datainstances.Finally there’s the option to store data related to a value as files, i.e. as instances of
clld.db.models.common.Value_files.