Universal Dependencies

https://universaldependencies.org/ Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD是一个为了对不同人类语言的语法(词性、词法特性、句法依赖)进行连续标记的框架。

Morphology

The morphological specification of a (syntactic) word in the UD scheme consists of three levels of representation:

  • A lemma representing the semantic content of the word.
  • A part-of-speech tag representing the abstract lexical category associated with the word.
  • A set of features representing lexical and grammatical properties that are associated with the particular word form.

UD对一个词汇的词法形态表示由3级表示构成:

  • 词汇的词元。The LEMMA field should contain the canonical or base form of the word. LEMMA域包含词汇的基本形式。
  • 词性标签。
  • 代表词汇在词法和语法上的属性的特征集合。

词性标签(Part-of-Speech tags)

UD只定义了17种通用词性标签(universal POS tags),更细粒度的词性分类采用通用属性(universal features)。

  • ADJ: adjective
  • ADP: adposition
  • ADV: adverb
  • AUX: auxiliary
  • CCONJ: coordinating conjunction
  • DET: determiner
  • INTJ: interjection
  • NOUN: noun
  • NUM: numeral
  • PART: particle
  • PRON: pronoun
  • PROPN: proper noun
  • PUNCT: punctuation
  • SCONJ: subordinating conjunction
  • SYM: symbol
  • VERB: verb
  • X: other

CoNLL-U format定义了额外的词性标签XPOS。不同的语言有不同的XPOS。

每个词汇有且只能有一个POS tag。

特征(Features)

Features are additional pieces of information about the word, its part of speech and morphosyntactic properties. 特征是关于词语及其词性、词形属性的额外信息。

特征的表示形式是 Name=Value,每个词语可以拥有多个特征,特征之间通过“|”分割,例如:Gender=Masc|Number=Sing

UD的inventory of features定义了词汇的特征。

特征的分类包含以下:

Syntax

Syntactic annotation in the UD scheme consists of typed dependency relations between words. UD scheme的语法标注包含词与词之间的类型化依赖关系。

Universal dependencies详见https://universaldependencies.org/u/dep/index.html

 Share!