Michal Ptaszynski / Research

Page Navigation

→ System Description
→ Released Files
→ Development History
→ Main References

Other Research

ML-Ask (all its versions and libraries) is released under the New BSD License.

ML-Ask: Affect Analysis System

This page contains information about ML-Ask. ML-Ask is a system for Affect Analysis of textual input in Japanese. It is based on a linguistic assumption that emotional states of a speaker are conveyed by emotional expressions used in emotive utterances. ML-Ask firstly separates emotive utterances from non-emotive and in the emotive utterances seeks for expressions of specific emotion types.

System Description

ML-Ask, or eMotive eLement and Expression Analysis system is a keyword-based language-dependent system for automatic affect annotation on utterances in Japanese. It uses a two-step procedure:

1. Specifying whether a sentence is emotive, and

2. Recognizing particular emotion types in utterances described as emotive. ML-Ask is based on the idea of two-part classification of realizations of emotions in language into:

Emotive elements

emotemes

Emotive expressions

I collected and hand-crafted a database of 907 emotemes, which include such groups of emotemes as:

interjections: すごい sugoi (great!)

mimetic expressions (gitaigo in Japanese): わくわく wakuwaku (heart pounding)

vulgar language: やがる -yagaru (syntactic morpheme used in verb vulgarization)

emotive sentence markers: �!�, or �??� (sentence markers indicating emotiveness)

A set of features similar to what I define as emotemes has been also applied in other research on discrimination between emotive (emotional/subjective) and non-emotive (neutral/objective) sentences (see for example Wiebe et al., 2005, Wilson & Wiebe, 2005, or Aman & Szpakowicz, 2007).
Emotive expressions can be realized by various parts of speech and phrases, such as:

nouns: 愛情　aijou (love),

verbs: 悲しむ　kanashimu (to feel sad, to grieve)

adjectives: 嬉しい　ureshii (happy)

phrases: 虫唾が走る mushizu ga hashiru (to give one the creeps [from hate])

As the collection of emotive expressions ML-Ask uses a database created on the basis of Akira Nakamura�s �Emotive Expression Dictionary�. The emotive expression database is a collection of over two thousand expressions describing emotional states. It also incorporates an emotion classification reflecting Japanese language and cluture. All expressions are classified as representing a specific emotion type, one or more if applicable. In particular, the ten emotion types are: 喜 ki/yorokobi (joy, delight), 怒 dō/ikari (anger), 哀 ai/aware (sorrow, sadness, gloom), 怖 fu/kowagari (fear), 恥 chi/haji (shame, shyness, bashfulness), 好 kō/suki (liking, fondness), 厭 en/iya (dislike, detestation), 昂 kō/takaburi (excitement), 安 an/yasuragi (relief), and 驚 kyō/odoroki (surprise, amazement). The distribution of separate expressions across all emotion classes is represented in the table below.

Emotion class	Nunber of expressions	Emotion class	Nunber of expressions
dislike	532	fondness	197
excitement	269	fear	147
sadness	232	surprise	129
joy	224	relief	106
anger	199	shame	65
		Sum	2100

ML-Ask also implements the idea of Contextual Valence Shifters (CVS) for Japanese. The idea of CVS, as proposed by Polanyi and Zaenen, 2006, assumes two kinds of CVS: negations and intensifiers. Negations are words and phrases like �not�, �never� or �not quite�, which change semantic polarity of an evaluative word they refer to. Intensifiers are words like �very�, �very much� or �deeply�, which intensify the semantic orientation of an evaluative word.
ML-Ask incorporates the negation type of CVS with 108 syntactic negation structures. Examples of CVS negations in Japanese are structures such as:

あまり～ない amari -nai (not quite-)

～とは言えない -to wa ienai (cannot say it is-)

～てはいけない -te wa ikenai (cannot+[verb]-)

As for intensifiers, although ML-Ask does not include them as a separate database, most Japanese intensifiers are included in the emoteme database.

Finally, ML-Ask implements Russell�s two dimensional model of affect. The model assumes that all emotions can be represented in two dimensions: the valence (positive/negative) and activation (activated/deactivated). An example of negative-activated emotion could be �anger�; a positive-deactivated emotion is, e.g., �relief�. The emotion classes annotated by the system are also generalized on this model.

ML-Ask ONLINE DEMO: soon.

See the online presentation on ML-Ask from PACLING 2009:

Released Files

Released files related to the system.

File Name	Description	Size/Download
emotions_20240804.zip	Newest and final emotive expression lexicon for ML-Ask. Zipped archive. Includes both manual dictionary expansion by Wang, and automatic dictionary expansion by Isomura. Also, includes information on citations for related publications and PDF files for those publications.	2.4 MB /
emotions.zip	Emotive expression lexicon used in ML-Ask. Zipped archive.	13 kB /
cvs.txt	CVS structures used in ML-Ask. Actually there are more used in the system, but these are the most frequent 72 structures.	2 kB /
mlask43-simple-noregex.zip	Most recent ver.of ML-Ask-simple Recommended for most users. Should work for most contents in Japanese (blogs, fairytales, etc.). System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine	22 kB /
mlask43-noregex.zip	Most recent version of ML-Ask. I usually use it for analysis of conversation-like inputs in Japanese (IM, dialog agents, etc.). System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine	77 kB /
mlask42212a-simple.zip	ML-Ask-simple, ver. 4.2. System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine	22 kB /
mlask42212.zip	ML-Ask, ver. 4.2. System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine	77 kB /
mlask40.zip	ML-Ask 4.0. Older, slower, and devours a lot of memory, but generally works, although has some bugs. System written in Perl, zipped archive. Dependencies to install: mecab, mecab-perl-binding, re::re2::engine	76 kB /
mlask31.rar	ML-Ask 3.1. The oldest working version that remained. This version has implemented 2D affect model, but doesn't (!) have working CVS. Not recommended to anyone for anything. Uploaded only for historical reasons.	42 kB /
When using the above files, please cite the preferred In case of problems with the above files, contact me.		citation.

Development History

Check for benchmarks of all versions of ML-Ask, here.

soon - try a threaded or forked version, expand lexicon, add online demo.
2013.08.07 [ML-Ask 4.3.1]
- Fixed a small bug appearing in ML-Ask which caused crashing on startup due to utf libraries mismatching.
2013.05.20 [ML-Ask 4.3, codename: "noregex"]
- Added a few foreach loops, but got rid of most regex.
- Much faster than 4.2 (faster and more furious :-) ).
- Needs much less memory.
- Repaired a bug in 4.2 where if there was the same emotive expression in two emotion type databases the system extracted only one emotion type.
- Beggining with this version I develop ML-Ask and ML-Ask-simple simultanously.
2011.10.27 [ML-Ask 4.2.2.1.2a, codename: "simple"]
- Additional version with no emoteme processing.
- Created especially for processing of non-conversation-like contents, like blogs, fairytales, etc.
2011.10.21 [ML-Ask 4.2, codename: "fast and furious"]
- Official release of ML-Ask 4.2.
- Added new algorithm for fast and precise emoticon detection,
- optimized all regex (simplified, compressed, added anchors, got rid of irrelevant grouping/brackets),
- added regex precompilation,
- where possible got rid of regex at all in favor of simpler (faster) operations,
- got rid of several loops,
- improved processing speed (up to 10 times comparing to 4.0),
- using much less memory.
- This version was used to annotate YACIS corpus.
2011.09.27 [ML-Ask 4.0]
- Official release of ML-Ask 4.0.
- Got rid of many lines of code,
- improved processing speed (3-6 times),
- added RE2 regex engine for faster regex matching,
- added additional interjection extraction with MeCab-perl-binding,
- added basic emoticon database from CAO to detect emoticons,
- added improved CVS algorithm,
- added improved 2D-affect space mapping algorithm,
- added processing of both whole files and STDIN.
around 2008 Nov [ML-Ask 3.0]
- First attempt to add Russell's 2D affect space (still very clumsy).
around 2008 Sep [ML-Ask 2.0]
- First attempt to support ML-Ask with CVS.
around 2007 Sep [ML-Ask 1.0]
- First version of ML-Ask is created (keyword matching, no CVS, no 2D affect space).

Main References

Preffered Citation:

Michal Ptaszynski, Pawel Dybala, Rafal Rzepka and Kenji Araki, �Affecting Corpora: Experiments with Automatic Affect Annotation System - A Case Study of the 2channel Forum -�, In Proceedings of The Conference of the Pacific Association for Computational Linguistics (PACLING-09), September 1-4, 2009, Hokkaido University, Sapporo, Japan, pp. 223-228. watch the slides

Michal Ptaszynski, Pawel Dybala, Wenhan Shi, Rafal Rzepka and Kenji Araki, �A System for Affect Analysis of Utterances in Japanese Supported with Web Mining�, Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, Vol. 21, No. 2 (April), pp. 30-49 (194-213), 2009. download paper