cort: coreference resolution toolkit
This Python library consists of two parts: the coreference resolution component implements a framework for coreference resolution based on latent variables, which allows you to rapidly devise approaches to coreference resolution (described in our TACL and ACL'15 Demo papers). The error analysis component provides extensive functionality for analyzing and visualizing errors made by coreference resolution systems (described in our EMNLP'14 and NAACL'15 demo papers). It also provides an implementation of the deterministic multigraph coreference resolution system (described in my ACL-SRW'13 paper).

Furthermore, branches in the github repository linked above contain implementation of an extended version of cort described in my PhD thesis, and of the k-best coreference resolution system described in our EMNLP'17 paper.

tilse: timeline summarization and evaluation
This library implements functionality for evaluating and predicting timeline summaries. It provides an implementation of evaluation metrics tailored for evaluation timeline summarization (described in our EACL'17 paper). Furthermore, it implements baselines for timeline summarization and it implements a framework for timeline summarization based on submodular optimization (as described in our CoNLL'18 paper).

art: approximate randomization testing
This package performs approximate randomization testing for corpus-wide differences in F1 score or accuracy. It is easily extensible for other metrics. Furthermore, it ships with a script that transforms the output from the CoNLL scorer for coreference resolution into the suitable format.