【雞】丁酉年辛亥月壬子日 / 十月初六日
Tuesday November 21, 2017

Language Processing

The knowledge of language needed to engage in complex language behavior can be separated into six distinct categories:
  • morphology : the way words are built up from small meaning-bearing units
  • syntax : the structural relationships between words
  • semantics : the meaning
  • phonetics and phonology : linguistic sounds
  • pragmatics : how language is used to accomplish goals
  • discourse : the study of linguistic units larger than a single utterance

  • And one should take in account :
  • ambiguities (this means that for an imput there are multiple alternative linguistic structures that can be built for it)
In the field of language processing following topics are important:
  • word segmentation
  • POS (part-of-speech) tagging
  • phrase identification
  • parsing
  • grammar development
  • lexicon acquisition
  • corpus development
History
1992 : Segmentation Standard, Announcement of the first national standard for word segmentation by PRC government. (GB 13715)
1993 : Lexicon, Completion and Release of the first version of CKIP lexicon (with the category set and ICG thematic roles), First version of K. Chen's parser for Chinese
1998 : Segmentation Standard Official announcement of CNS14366 for Taiwan
2000 : Treebanks, Simultaneous completion and announcement of two Chinese Treebanks:
* Penn Chinese Treebank
* Sinica Treebank


[ < Home]