Jump to content
UNRV Ancient Roman Empire Forums

Upcoming Latin Treebank


Recommended Posts

This came on the wires of LinguistList; for those of you who are knowledgeable of the lingua latina, this might be something to look for in future...or perhaps even contribute!


Call for Collaboration: Latin Treebank


The Perseus Project has recently received a planning grant from the NSF to

investigate the costs and labor involved in constructing a

multimillion-word Latin treebank (a large collection of syntactically

parsed sentences), along with its potential value for the linguistics and

Classics community. While our initial efforts under this grant will focus

on syntactically annotating excerpts from Golden Age authors (Caesar,

Cicero, Vergil) and the Vulgate, a future multimillion-word corpus would be

comprised of writings from the pre-Classical period up through the Early

Modern era. To date we've annotated a total of 12,000 words in a style

that's predominantly informed by two sources: the dependency grammar used

by the Prague Dependency Treebank (itself based on Mel'cuk 1988), and the

Latin grammar of Pinkster 1990.


While treebanks provide valuable training data for computational tasks such

as grammar induction and automatic syntactic parsing, they also have the

potential to be used in traditional research areas as well. Large

collections of syntactically parsed sentences have the potential to

revolutionize lexicography and philology, as they provide the immediate

context for a word's use along with its typical syntactic arguments (this

lets us chart, for example, how the meaning of a verb changes as its

predominant arguments change). Treebanks enable large-scale research into

structurally-based rhetorical devices particularly of interest to

Classicists (such as hyperbaton) and they provide the raw data for research

in historical linguistics (such as the move in Latin from classical SOV

word order to romance SVO).


The eventual Latin treebank will be openly available to the public; we

should, therefore, come to a consensus on how it should be built. To that

end we encourage input from the linguistics and Classics community on the

treebank design (including the syntactic representation of Latin) and

welcome contributions by annotators (for which limited funding is

available). Interested collaborators should contact David Bamman

(David.Bamman@tufts.edu) at the Perseus Project.

Link to comment
Share on other sites

Very interesting news. Though I would suggest that the Perseus project as a whole upgrade their servers and potentially even the database software. Its a shame that that such an invaluable project is so abyssmal for surfing, page loading and searching. I rarely even try using it anymore (though the mirrors are a bit better).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...