Upcoming Latin Treebank

docoflove1974 · September 19, 2006

This came on the wires of LinguistList; for those of you who are knowledgeable of the lingua latina, this might be something to look for in future...or perhaps even contribute!

Call for Collaboration: Latin Treebank

The Perseus Project has recently received a planning grant from the NSF to

investigate the costs and labor involved in constructing a

multimillion-word Latin treebank (a large collection of syntactically

parsed sentences), along with its potential value for the linguistics and

Classics community. While our initial efforts under this grant will focus

on syntactically annotating excerpts from Golden Age authors (Caesar,

Cicero, Vergil) and the Vulgate, a future multimillion-word corpus would be

comprised of writings from the pre-Classical period up through the Early

Modern era. To date we've annotated a total of 12,000 words in a style

that's predominantly informed by two sources: the dependency grammar used

by the Prague Dependency Treebank (itself based on Mel'cuk 1988), and the

Latin grammar of Pinkster 1990.

While treebanks provide valuable training data for computational tasks such

as grammar induction and automatic syntactic parsing, they also have the

potential to be used in traditional research areas as well. Large

collections of syntactically parsed sentences have the potential to

revolutionize lexicography and philology, as they provide the immediate

context for a word's use along with its typical syntactic arguments (this

lets us chart, for example, how the meaning of a verb changes as its

predominant arguments change). Treebanks enable large-scale research into

structurally-based rhetorical devices particularly of interest to

Classicists (such as hyperbaton) and they provide the raw data for research

in historical linguistics (such as the move in Latin from classical SOV

word order to romance SVO).

The eventual Latin treebank will be openly available to the public; we

should, therefore, come to a consensus on how it should be built. To that

end we encourage input from the linguistics and Classics community on the

treebank design (including the syntactic representation of Latin) and

welcome contributions by annotators (for which limited funding is

available). Interested collaborators should contact David Bamman

(David.Bamman@tufts.edu) at the Perseus Project.

Primus Pilus · September 20, 2006

Very interesting news. Though I would suggest that the Perseus project as a whole upgrade their servers and potentially even the database software. Its a shame that that such an invaluable project is so abyssmal for surfing, page loading and searching. I rarely even try using it anymore (though the mirrors are a bit better).

Sign In

Upcoming Latin Treebank

Recommended Posts

docoflove1974

Link to comment

Share on other sites

Primus Pilus

Link to comment

Share on other sites

Join the conversation

Browse

Activity