Reading delimiter-separated-values dsv

Support for reading and writing delimiter-separated value files.


This function is called to convert ASCII strings to something that can pass as python attribute name, to be used with namedtuples.

>>> assert normalize_name('class') == 'class_'
>>> assert normalize_name('a-name') == 'a_name'
>>> assert normalize_name('a näme') == 'a_name'
>>> assert normalize_name('Name') == 'Name'
>>> assert normalize_name('') == '_'
>>> assert normalize_name('1') == '_1'
clld.lib.dsv.reader(lines_or_file, namedtuples=False, dicts=False, encoding=u'utf8', **kw)[source]
  • lines_or_file – Content to be read. Either a file handle, a file path or a list of strings.
  • namedtuples – Yield namedtuples.
  • dicts – Yield dicts.
  • encoding – Encoding of the content.
  • kw – Keyword parameters are passed through to csv.reader. Note that as opposed to csv.reader delimiter defaults to ‘ ‘ not ‘,’.

A generator over the rows.


functionality to gather information about iso-639-3 codes from


retrieve a resource from the sil site and return it’s representation.


scrape information about a iso 639-3 code from the documentation page.


generator for entries in a tab file specified by name.


retrieves the current (date-stamped) file names for download files from sil’s download page.


This module provides functionality for handling our data as rdf.

class clld.lib.rdf.ClldGraph(*args, **kw)[source]

augment the standard rdflib.Graph by making sure our standard ns prefixes are always bound.

class clld.lib.rdf.Notation

Notation(name, extension, mimetype, uri)


Alias for field number 1


Alias for field number 2


Alias for field number 0


Alias for field number 3

clld.lib.rdf.properties_as_xml_snippet(subject, props)[source]

somewhat ugly way to get at a snippet of an rdf-xml serialization of properties of a subject.


Functionality to handle bibligraphical data in the BibTeX format.

class clld.lib.bibtex.Database(records)[source]

a class to handle bibtex databases, i.e. a container class for Record instances.

classmethod from_file(bibFile, encoding='utf8', lowercase=False)[source]

a bibtex database defined by a bib-file

@param bibFile: path of the bibtex-database-file to be read.


map bibtex record ids to list index

class clld.lib.bibtex.EntryType[source]
An article from a journal or magazine. Required fields: author, title, journal, year Optional fields: volume, number, pages, month, note, key
A book with an explicit publisher. Required fields: author/editor, title, publisher, year Optional fields: volume/number, series, address, edition, month, note, key
A work that is printed and bound, but without a named publisher or sponsoring institution. Required fields: title Optional fields: author, howpublished, address, month, year, note, key
The same as inproceedings, included for Scribe compatibility.
A part of a book, usually untitled. May be a chapter (or section or whatever) and/or a range of pages. Required fields: author/editor, title, chapter/pages, publisher, year Optional fields: volume/number, series, type, address, edition, month, note, key
A part of a book having its own title. Required fields: author, title, booktitle, publisher, year Optional fields: editor, volume/number, series, type, chapter, pages, address, edition, month, note, key
An article in a conference proceedings. Required fields: author, title, booktitle, year Optional fields: editor, volume/number, series, pages, address, month, organization, publisher, note, key
Technical documentation. Required fields: title Optional fields: author, organization, address, edition, month, year, note, key
A Master’s thesis. Required fields: author, title, school, year Optional fields: type, address, month, note, key
For use when nothing else fits. Required fields: none Optional fields: author, title, howpublished, month, year, note, key
A Ph.D. thesis. Required fields: author, title, school, year Optional fields: type, address, month, note, key
The proceedings of a conference. Required fields: title, year Optional fields: editor, volume/number, series, address, month, publisher, organization, note, key
A report published by a school or other institution, usually numbered within a series. Required fields: author, title, institution, year Optional fields: type, number, address, month, note, key
A document having an author and title, but not formally published. Required fields: author, title, note Optional fields: month, year, key
class clld.lib.bibtex.Record(genre, id_, *args, **kw)[source]

A BibTeX record is basically an ordered dict with two special properties - id and genre.

To overcome the limitation of single values per field in BibTeX, we allow fields, i.e. values of the dict to be iterables of strings as well. Note that to support this use case comprehensively, various methods of retrieving values will behave differently. I.e. values will be

  • joined to a string in __getitem__,
  • retrievable as assigned with get (i.e. only use get if you know how a value was assigned),
  • retrievable as list with getall


Unknown genres are converted to “misc”.

>>> r = Record('article', '1', author=['a', 'b'], editor='a and b')
>>> assert r['author'] == 'a and b'
>>> assert r.get('author') == r.getall('author')
>>> assert r['editor'] == r.get('editor')
>>> assert r.getall('editor') == ['a', 'b']
Returns:list of strings representing the values of the record for field ‘key’.

linearize the bib record according to the rules of the unified style

Book: author. year. booktitle. (series, volume.) address: publisher.

Article: author. year. title. journal volume(issue). pages.

Incollection: author. year. title. In editor (ed.), booktitle, pages. address: publisher.


remove unicode invalid characters

>>> stripctrlchars(u'a\u0008\u000ba')

Unencode Unicode escape sequences match all 3/4-digit sequences with unicode character replace all ‘?[u....]’ with corresponding unicode

There are some decimal/octal mismatches in unicode encodings in bibtex

>>> r = u_unescape(r'?[\u123] ?[\u1234]')

transform latex escape sequences of type `e into unicode


class clld.lib.coins.ContextObject(sid, mtx, *data)[source]
>>> c = ContextObject('sid', 'journal', ('jtitle', 'â'))
>>> assert '%C3%A2' in c.span_attrs()['title']
>>> c = ContextObject('sid', 'journal', ('jtitle', u'â'))
>>> assert '%C3%A2' in c.span_attrs()['title']


Functionality to retrieve data from a FileMaker server using the ‘Custom Web Publishing with XML’ protocol.

class clld.lib.fmpxml.Client(host, db, user, password, limit=1000, cache=None, verbose=True)[source]

Client for FileMaker’s ‘Custom Web Publishing with XML’ feature.

class clld.lib.fmpxml.Result(content)[source]

Parses a filemaker pro xml result.


normalize markup in filemaker data

>>> assert normalize_markup('') is None
>>> assert normalize_markup('<span>bla</span>') == 'bla'
>>> s = '<span style="font-style: italic;">bla</span>'
>>> assert normalize_markup(s) == s
>>> s = '<span style="font-weight: bold;">bla</span>'
>>> assert normalize_markup(s) == s
>>> s = '<span style="font-variant: small-caps;">bla</span>'
>>> assert normalize_markup(s) == s