"I understand a fury in your words, but not your words." (- William Shakespeare, Othello, 4.2) | English | 日本語 | Polski

Michal Ptaszynski / Research

ML-Ask: Affect Analysis System

This page contains information about ML-Ask. ML-Ask is a system for Affect Analysis of textual input in Japanese. It is based on a linguistic assumption that emotional states of a speaker are conveyed by emotional expressions used in emotive utterances. ML-Ask firstly separates emotive utterances from non-emotive and in the emotive utterances seeks for expressions of specific emotion types.

System Description

ML-Ask, or eMotive eLement and Expression Analysis system is a keyword-based language-dependent system for automatic affect annotation on utterances in Japanese. It uses a two-step procedure:
    1. Specifying whether a sentence is emotive, and
    2. Recognizing particular emotion types in utterances described as emotive.
ML-Ask is based on the idea of two-part classification of realizations of emotions in language into:
    1) Emotive elements or emotemes, which indicate that a sentence is emotive, but do not detail what specific emotions have been expressed. For example, interjections such as “whoa!” or “Oh!” indicate that the speaker (producer of the utterance) have conveyed some emotions. However, it is not possible, basing only on the analysis of those words, to estimate precisely what kind of emotion the speaker conveyed.
    2) Emotive expressions are words or phrases that directly describe emotional states, but could be used to both express one’s emotions and describe the emotion without emotional engagement.
I collected and hand-crafted a database of 907 emotemes, which include such groups of emotemes as:
  • interjections: すごい sugoi (great!)
  • mimetic expressions (gitaigo in Japanese): わくわく wakuwaku (heart pounding)
  • vulgar language: やがる -yagaru (syntactic morpheme used in verb vulgarization)
  • emotive sentence markers: ‘!’, or ‘??’ (sentence markers indicating emotiveness)
  • A set of features similar to what I define as emotemes has been also applied in other research on discrimination between emotive (emotional/subjective) and non-emotive (neutral/objective) sentences (see for example Wiebe et al., 2005, Wilson & Wiebe, 2005, or Aman & Szpakowicz, 2007).
    Emotive expressions can be realized by various parts of speech and phrases, such as:
  • nouns: 愛情 aijou (love),
  • verbs: 悲しむ kanashimu (to feel sad, to grieve)
  • adjectives: 嬉しい ureshii (happy)
  • phrases: 虫唾が走る mushizu ga hashiru (to give one the creeps [from hate])
  • As the collection of emotive expressions ML-Ask uses a database created on the basis of Akira Nakamura’s “Emotive Expression Dictionary”. The emotive expression database is a collection of over two thousand expressions describing emotional states. It also incorporates an emotion classification reflecting Japanese language and cluture. All expressions are classified as representing a specific emotion type, one or more if applicable. In particular, the ten emotion types are: ki/yorokobi (joy, delight), dō/ikari (anger), ai/aware (sorrow, sadness, gloom), fu/kowagari (fear), chi/haji (shame, shyness, bashfulness), kō/suki (liking, fondness), en/iya (dislike, detestation), kō/takaburi (excitement), an/yasuragi (relief), and kyō/odoroki (surprise, amazement). The distribution of separate expressions across all emotion classes is represented in the table below.
    Emotion class Nunber of expressions Emotion class Nunber of expressions
    dislike 532 fondness 197
    excitement 269 fear 147
    sadness 232 surprise 129
    joy 224 relief 106
    anger 199 shame 65
    Sum 2100

    ML-Ask also implements the idea of Contextual Valence Shifters (CVS) for Japanese. The idea of CVS, as proposed by Polanyi and Zaenen, 2006, assumes two kinds of CVS: negations and intensifiers. Negations are words and phrases like “not”, “never” or “not quite”, which change semantic polarity of an evaluative word they refer to. Intensifiers are words like “very”, “very much” or “deeply”, which intensify the semantic orientation of an evaluative word.
    ML-Ask incorporates the negation type of CVS with 108 syntactic negation structures. Examples of CVS negations in Japanese are structures such as:
  • あまり~ない amari -nai (not quite-)
  • ~とは言えない -to wa ienai (cannot say it is-)
  • ~てはいけない -te wa ikenai (cannot+[verb]-)
  • As for intensifiers, although ML-Ask does not include them as a separate database, most Japanese intensifiers are included in the emoteme database.

    Finally, ML-Ask implements Russell’s two dimensional model of affect. The model assumes that all emotions can be represented in two dimensions: the valence (positive/negative) and activation (activated/deactivated). An example of negative-activated emotion could be “anger”; a positive-deactivated emotion is, e.g., “relief”. The emotion classes annotated by the system are also generalized on this model.

    ML-Ask ONLINE DEMO: soon.

    See the online presentation on ML-Ask from PACLING 2009:

    back to top

    Released Files

    Released files related to the system.
    File Name Description Size/Download
    emotions.zip Emotive expression lexicon used in ML-Ask. Zipped archive. 13 kB /
    cvs.txt CVS structures used in ML-Ask. Actually there are more used in the system, but these are the most frequent 72 structures. 2 kB /
    mlask43-simple-noregex.zip Most recent ver.of ML-Ask-simple Recommended for most users. Should work for most contents in Japanese (blogs, fairytales, etc.). System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine 22 kB /
    mlask43-noregex.zip Most recent version of ML-Ask.
    I usually use it for analysis of conversation-like inputs in Japanese (IM, dialog agents, etc.). System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine
    77 kB /
    mlask42212a-simple.zip ML-Ask-simple, ver. 4.2. System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine 22 kB /
    mlask42212.zip ML-Ask, ver. 4.2. System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine 77 kB /
    mlask40.zip ML-Ask 4.0. Older, slower, and devours a lot of memory, but generally works, although has some bugs. System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine 76 kB /
    mlask31.rar ML-Ask 3.1. The oldest working version that remained. This version has implemented 2D affect model, but doesn't (!) have working CVS. Not recommended to anyone for anything. Uploaded only for historical reasons. 42 kB /
    When using the above files, please cite the preferred
    In case of problems with the above files, contact me.

    back to top

    Development History

    Check for benchmarks of all versions of ML-Ask, here.
    • soon - try a threaded or forked version, expand lexicon, add online demo.
    • 2013.08.07 [ML-Ask 4.3.1]
      • Fixed a small bug appearing in ML-Ask which caused crashing on startup due to utf libraries mismatching.
    • 2013.05.20 [ML-Ask 4.3, codename: "noregex"]
      • Added a few foreach loops, but got rid of most regex.
      • Much faster than 4.2 (faster and more furious :-) ).
      • Needs much less memory.
      • Repaired a bug in 4.2 where if there was the same emotive expression in two emotion type databases the system extracted only one emotion type.
      • Beggining with this version I develop ML-Ask and ML-Ask-simple simultanously.
    • 2011.10.27 [ML-Ask, codename: "simple"]
      • Additional version with no emoteme processing.
      • Created especially for processing of non-conversation-like contents, like blogs, fairytales, etc.
    • 2011.10.21 [ML-Ask 4.2, codename: "fast and furious"]
      • Official release of ML-Ask 4.2.
      • Added new algorithm for fast and precise emoticon detection,
      • optimized all regex (simplified, compressed, added anchors, got rid of irrelevant grouping/brackets),
      • added regex precompilation,
      • where possible got rid of regex at all in favor of simpler (faster) operations,
      • got rid of several loops,
      • improved processing speed (up to 10 times comparing to 4.0),
      • using much less memory.
      • This version was used to annotate YACIS corpus.
    • 2011.09.27 [ML-Ask 4.0]
      • Official release of ML-Ask 4.0.
      • Got rid of many lines of code,
      • improved processing speed (3-6 times),
      • added RE2 regex engine for faster regex matching,
      • added additional interjection extraction with MeCab-perl-binding,
      • added basic emoticon database from CAO to detect emoticons,
      • added improved CVS algorithm,
      • added improved 2D-affect space mapping algorithm,
      • added processing of both whole files and STDIN.
    • around 2008 Nov [ML-Ask 3.0]
      • First attempt to add Russell's 2D affect space (still very clumsy).
    • around 2008 Sep [ML-Ask 2.0]
      • First attempt to support ML-Ask with CVS.
    • around 2007 Sep [ML-Ask 1.0]
      • First version of ML-Ask is created (keyword matching, no CVS, no 2D affect space).

    back to top

    Main References

    Preffered Citation:
  • Michal Ptaszynski, Pawel Dybala, Rafal Rzepka and Kenji Araki, “Affecting Corpora: Experiments with Automatic Affect Annotation System - A Case Study of the 2channel Forum -”, In Proceedings of The Conference of the Pacific Association for Computational Linguistics (PACLING-09), September 1-4, 2009, Hokkaido University, Sapporo, Japan, pp. 223-228. watch the slides

  • Michal Ptaszynski, Pawel Dybala, Wenhan Shi, Rafal Rzepka and Kenji Araki, “A System for Affect Analysis of Utterances in Japanese Supported with Web Mining”, Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, Vol. 21, No. 2 (April), pp. 30-49 (194-213), 2009. download paper