Last Modified: December 30, 2020


Pun database

 

 We will release the pun database created in Grant-in-Aid for Scientific Research C (Proposal number: 17K00294, Project name: Evaluation method of humor fun and development of standard data set).

 

Overview

 

Includes 67,000 puns

 

Add species expression, transformation expression, and type tag

Type

Explanation

Example

Coexistence type

1.Perfect

The seed expression and the transformation expression are exactly the same in pronunciation.

(Taisho) ga [Taisyo]

(General wins grand prize.)

2.Imperfect

The seed expression and the transformation expression are similar in pronunciation.

(Kichinto) seiri sa re ta [kittin]

(A neatly organized kitchen)

3.Superimposed type

Although the seed expression exists in the background knowledge and context, it does not exist explicitly.

[Sui ma senbaduru]

 (Excuse me thousand origami cranes.)

4.Unknown

It can't be interpreted as puns.

A , Are Yama da ! 」

(Oh, that mountain.)

Seed expression ( ), Transformation expression [ ], Reading < >

 

Use the morphological analysis tool MeCab (https://taku910.github.io/mecab/) to divide each word by a space.

Give a score of fun evaluated by 3 people on a 5-point scale

 5: Very interesting, 4: Interesting, 3: Normal, 2: Not interesting, 1: Not very interesting (including things that you don't think are puns)

 · format

 Serial number, prototype, tagged, type, score 1, score 2, score 3, average score

 

Example

73, Karenda- Darenda- Orenda-, (Karenda-)12 [Dare n da-]1 [Ore n da-]2, 22, 4, 2, 3, 3.00

English translation: Calendar, who is it?, its mine.

There are multiple transformation expressions in one pun

Seed expression Karenda-"

Transformation expression Darenda", Orenda-

Write the corresponding number after the seed expression and transformation expression to indicate the correspondence

Type of puns in order of appearance "22

Evaluator Score 1,2,3 and average are 4, 2, 3, 3.00 respectively

 

 Terms of use

 

 It is a condition to use that you agree to the following items. If you would like to use it, please contact Araki by mail (araki (at mark) ist.hokudai.ac.jp).

 

1. The data will be used only for academic research and will not be used for commercial purposes.

2. You will not publish part or all of the data to a location accessible to third parties.

3.  Do not redistribute some or all of the data.

4. If the copyright holder or distributor of the data requests the deletion of part or all of the data, we will promptly delete the relevant data from all computers and media and notify you that the data has been deleted.

5. When publishing the results of academic research conducted using data, it is clearly stated that the data developed in Grant-in-Aid for Scientific Research C (problem number: 17K00294) was used. References "Kenji Araki, Koichi Sayama, Yuzu Uchida, Motoki Yatsu: Expansion and Analysis of a Fashionable Database, Japanese Society for Artificial Intelligence Type 2 Study Group Language Engineering Study Group Material, SIG-LSE-B803-1, pp.1-15 , 2018.

6. The distributor is not responsible for any disadvantages caused by using the data. 7. If any matter not stipulated in this agreement arises, we will discuss it in good faith and resolve the matter. 


Back to Homepage