https://universaldependencies.org/ Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD是一个为了对不同人类语言的语法(词性、词法特性、句法依赖)进行连续标记的框架。
Morphology
The morphological specification of a (syntactic) word in the UD scheme consists of three levels of representation:
- A lemma representing the semantic content of the word.
 - A part-of-speech tag representing the abstract lexical category associated with the word.
 - A set of features representing lexical and grammatical properties that are associated with the particular word form.
 
UD对一个词汇的词法形态表示由3级表示构成:
- 词汇的词元。The LEMMA field should contain the canonical or base form of the word. LEMMA域包含词汇的基本形式。
 - 词性标签。
 - 代表词汇在词法和语法上的属性的特征集合。
 
词性标签(Part-of-Speech tags)
UD只定义了17种通用词性标签(universal POS tags),更细粒度的词性分类采用通用属性(universal features)。
- ADJ: adjective
 - ADP: adposition
 - ADV: adverb
 - AUX: auxiliary
 - CCONJ: coordinating conjunction
 - DET: determiner
 - INTJ: interjection
 - NOUN: noun
 - NUM: numeral
 - PART: particle
 - PRON: pronoun
 - PROPN: proper noun
 - PUNCT: punctuation
 - SCONJ: subordinating conjunction
 - SYM: symbol
 - VERB: verb
 - X: other
 
CoNLL-U format定义了额外的词性标签XPOS。不同的语言有不同的XPOS。
每个词汇有且只能有一个POS tag。
特征(Features)
Features are additional pieces of information about the word, its part of speech and morphosyntactic properties. 特征是关于词语及其词性、词形属性的额外信息。
特征的表示形式是 Name=Value,每个词语可以拥有多个特征,特征之间通过“|”分割,例如:Gender=Masc|Number=Sing。
UD的inventory of features定义了词汇的特征。
特征的分类包含以下:
- Lexical features:词素、词元的属性。
 - Inflectional features:屈折(?)属性。(屈折语)
 - Layered features:详见https://universaldependencies.org/u/overview/feat-layers.html
 
Syntax
Syntactic annotation in the UD scheme consists of typed dependency relations between words. UD scheme的语法标注包含词与词之间的类型化依赖关系。
Universal dependencies详见https://universaldependencies.org/u/dep/index.html