Personality Detection Khajeh Nasir Toosi University (PD_KNTU)

Terms of use:
1) This Corpora can be used freely for research purposes.
2) If interested in commercial use of the corpora, send email to the contact.
Please feel free to send us an email:
-with feedback regarding the corpora.
-with information on how you have used the corpora.
-if interested in a collaborative research project.


Mohammad Mobasher

Full Description

Knowledge about an individuals’ personality can allow us to make predictions about preferences across contexts and environments, and enhance recommendation systems. This is not only for recommendation but also useful for commercial purposes and also help in understanding the mental health and high risk factors of online users. There are not many golden standard datasets from social media platforms available for the personality prediction task. The main reason is that gathering labeled data is time-consuming and expensive. That is why we decided to create a suitable data set. We present PD_KNTU, a novel corpus developed to aid research in personality detections. The PD_KNTU corpora is list of people (1500) who got personality test from and shared the results in twitter social network. Each user represented with 17 features:
1) id
2) screen_name
3) name
4) url
5) followers_count
6) friends_count
7) status_count
8) favorites_count

9) listed_count
10) description
11) contributors
12) enable
13) protected
14) location
15) lang
16) MBTI
17) gender
For each user we download maximum 3000 recent tweets (for someone there is no tweet). The final corpus contains 3182352 Tweets.
In this corpora we collected 50 people and for each people 1000 recent tweets.
The author thanks to Dr. Farzi in this area for his exchange ideas.

Download Page