Lib

Reading delimiter-separated-values dsv

Support for reading and writing delimiter-separated value files.

clld.lib.dsv.normalize_name(s)[source]

This function is called to convert ASCII strings to something that can pass as python attribute name, to be used with namedtuples.

>>> assert normalize_name('class') == 'class_'
>>> assert normalize_name('a-name') == 'a_name'
>>> assert normalize_name('a näme') == 'a_name'
>>> assert normalize_name('Name') == 'Name'
>>> assert normalize_name('') == '_'
>>> assert normalize_name('1') == '_1'
clld.lib.dsv.reader(lines_or_file, namedtuples=False, dicts=False, encoding=u'utf8', **kw)[source]
Parameters:
  • lines_or_file – Content to be read. Either a file handle, a file path or a list of strings.
  • namedtuples – Yield namedtuples.
  • dicts – Yield dicts.
  • encoding – Encoding of the content.
  • kw – Keyword parameters are passed through to csv.reader. Note that as opposed to csv.reader delimiter defaults to ‘ ‘ not ‘,’.
Returns:

A generator over the rows.

iso

functionality to gather information about iso-639-3 codes from sil.org

clld.lib.iso.get(path)[source]

retrieve a resource from the sil site and return it’s representation.

clld.lib.iso.get_documentation(code)[source]

scrape information about a iso 639-3 code from the documentation page.

clld.lib.iso.get_tab(name)[source]

generator for entries in a tab file specified by name.

clld.lib.iso.get_taburls()[source]

retrieves the current (date-stamped) file names for download files from sil’s download page.

rdf

This module provides functionality for handling our data as rdf.

class clld.lib.rdf.ClldGraph(*args, **kw)[source]

augment the standard rdflib.Graph by making sure our standard ns prefixes are always bound.

class clld.lib.rdf.Notation

Notation(name, extension, mimetype, uri)

extension

Alias for field number 1

mimetype

Alias for field number 2

name

Alias for field number 0

uri

Alias for field number 3

clld.lib.rdf.properties_as_xml_snippet(subject, props)[source]

somewhat ugly way to get at a snippet of an rdf-xml serialization of properties of a subject.

bibtex

Functionality to handle bibligraphical data in the BibTeX format.

class clld.lib.bibtex.Database(records)[source]

a class to handle bibtex databases, i.e. a container class for Record instances.

classmethod from_file(bibFile, encoding='utf8', lowercase=False)[source]

a bibtex database defined by a bib-file

@param bibFile: path of the bibtex-database-file to be read.

keymap[source]

map bibtex record ids to list index

class clld.lib.bibtex.EntryType[source]
article
An article from a journal or magazine. Required fields: author, title, journal, year Optional fields: volume, number, pages, month, note, key
book
A book with an explicit publisher. Required fields: author/editor, title, publisher, year Optional fields: volume/number, series, address, edition, month, note, key
booklet
A work that is printed and bound, but without a named publisher or sponsoring institution. Required fields: title Optional fields: author, howpublished, address, month, year, note, key
conference
The same as inproceedings, included for Scribe compatibility.
inbook
A part of a book, usually untitled. May be a chapter (or section or whatever) and/or a range of pages. Required fields: author/editor, title, chapter/pages, publisher, year Optional fields: volume/number, series, type, address, edition, month, note, key
incollection
A part of a book having its own title. Required fields: author, title, booktitle, publisher, year Optional fields: editor, volume/number, series, type, chapter, pages, address, edition, month, note, key
inproceedings
An article in a conference proceedings. Required fields: author, title, booktitle, year Optional fields: editor, volume/number, series, pages, address, month, organization, publisher, note, key
manual
Technical documentation. Required fields: title Optional fields: author, organization, address, edition, month, year, note, key
mastersthesis
A Master’s thesis. Required fields: author, title, school, year Optional fields: type, address, month, note, key
misc
For use when nothing else fits. Required fields: none Optional fields: author, title, howpublished, month, year, note, key
phdthesis
A Ph.D. thesis. Required fields: author, title, school, year Optional fields: type, address, month, note, key
proceedings
The proceedings of a conference. Required fields: title, year Optional fields: editor, volume/number, series, address, month, publisher, organization, note, key
techreport
A report published by a school or other institution, usually numbered within a series. Required fields: author, title, institution, year Optional fields: type, number, address, month, note, key
unpublished
A document having an author and title, but not formally published. Required fields: author, title, note Optional fields: month, year, key
class clld.lib.bibtex.Record(genre, id_, *args, **kw)[source]

A BibTeX record is basically an ordered dict with two special properties - id and genre.

To overcome the limitation of single values per field in BibTeX, we allow fields, i.e. values of the dict to be iterables of strings as well. Note that to support this use case comprehensively, various methods of retrieving values will behave differently. I.e. values will be

  • joined to a string in __getitem__,
  • retrievable as assigned with get (i.e. only use get if you know how a value was assigned),
  • retrievable as list with getall

Note

Unknown genres are converted to “misc”.

>>> r = Record('article', '1', author=['a', 'b'], editor='a and b')
>>> assert r['author'] == 'a and b'
>>> assert r.get('author') == r.getall('author')
>>> assert r['editor'] == r.get('editor')
>>> assert r.getall('editor') == ['a', 'b']
getall(key)[source]
Returns:list of strings representing the values of the record for field ‘key’.
text()[source]

linearize the bib record according to the rules of the unified style

Book: author. year. booktitle. (series, volume.) address: publisher.

Article: author. year. title. journal volume(issue). pages.

Incollection: author. year. title. In editor (ed.), booktitle, pages. address: publisher.

clld.lib.bibtex.stripctrlchars(string)[source]

remove unicode invalid characters

>>> stripctrlchars(u'a\u0008\u000ba')
u'aa'
clld.lib.bibtex.u_unescape(s)[source]

Unencode Unicode escape sequences match all 3/4-digit sequences with unicode character replace all ‘?[u....]’ with corresponding unicode

There are some decimal/octal mismatches in unicode encodings in bibtex

>>> r = u_unescape(r'?[\u123] ?[\u1234]')
clld.lib.bibtex.unescape(string)[source]

transform latex escape sequences of type `e into unicode

coins

class clld.lib.coins.ContextObject(sid, mtx, *data)[source]
>>> c = ContextObject('sid', 'journal', ('jtitle', 'â'))
>>> assert '%C3%A2' in c.span_attrs()['title']
>>> c = ContextObject('sid', 'journal', ('jtitle', u'â'))
>>> assert '%C3%A2' in c.span_attrs()['title']

fmpxml

Functionality to retrieve data from a FileMaker server using the ‘Custom Web Publishing with XML’ protocol.

class clld.lib.fmpxml.Client(host, db, user, password, limit=1000, cache=None, verbose=True)[source]

Client for FileMaker’s ‘Custom Web Publishing with XML’ feature.

class clld.lib.fmpxml.Result(content)[source]

Parses a filemaker pro xml result.

clld.lib.fmpxml.normalize_markup(s)[source]

normalize markup in filemaker data

>>> assert normalize_markup('') is None
>>> assert normalize_markup('<span>bla</span>') == 'bla'
>>> s = '<span style="font-style: italic;">bla</span>'
>>> assert normalize_markup(s) == s
>>> s = '<span style="font-weight: bold;">bla</span>'
>>> assert normalize_markup(s) == s
>>> s = '<span style="font-variant: small-caps;">bla</span>'
>>> assert normalize_markup(s) == s