Sentiment Analysis Part 7: Calculating the trends

A big part of the process of doing sentiment analysis on my responses to specific types of questions is that I want to evaluate the trend of my mood and hopefully improve it.  It’s not just that I want to use more positive language when I discuss various topics, although training myself to do that would be nice, but rather I am hoping to correlate my sentiment with other aspects of my well being as part of a comprehensive self-coaching program to improve my general well-being and live a good life.

Using the python library numpy makes calculating a trend line from a set of data very easy.  Here I am working with the positive/negative scores that I have been collecting as I respond to automatically generated questions several times a day (see links at the bottom of this post for more detail).  The point of the code below is to calculate the slope of the trend line for the data I have been collecting and then to rate my progress based on whether that slope is positive or negative.  To move, in short, from an “ugly” or “bad” trend to a “fantastic” trend.

This process gives me the slope of the trend line associated with the scores of my primary sentiments, i.e. sentiments that I express in categories of topics like “Health” or “Family”.  The goal is then to see a positive slope in that trend line.  In other words, to improve the overall positivity of the sentiments I express related to the topics I am tracking.

Here is what the calculation of the slope of the trend line looks like.  Note the value of the variable “m” indicated by the red arrow:

The slope of the trend for the sentiments I am tracking is negative. So I need to use more positive language about those topics.
The slope of the trend for the sentiments I am tracking is negative. So I need to use more positive language about those topics.

Here are links that take you to each of the other posts in this series:

Sentiment Analysis Part 6: Scoring the Sentiment Questions

The table below shows the average scores for positive/negative sentiment for each of the questions I have responded to since I started building an automated life coaching system based on sentiment analysis and machine learning.  The next step is to group the questions by topic area and then to use the scores to suggest improvements.  For example, if my responses to questions about my mother consist of largely negative language, then the automated coaching system my generate an email that recommends that I try to use more positive language when talking about the past as it relates to my family.

Sentiment Topics with Scores



My initial grouping of all of the questions results in the following ranking by area from most positive sentiments expressed to the least positive sentiments expressed.

All topic areas
The scores for all topic areas that I have responded to thus far.

 

Here is how the questions look when grouped together by topic area for the topics that I generally express the most positive sentiments about:

Positive Sentiment
The questions and topic areas where I express the most positive sentiments consistently.

 

It is also interesting to note the topics I express the least positive sentiments about:

Negative Sentiment
The questions and topic areas that I express the least positive sentiment about.

Overall the goal is to use these scores together with the mood survey data to guide an AI system to generated scripted coaching plans that will help me to improve my sentiment in areas that are consistently scoring low.  For example, to break out each topic area and look at it’s trend line and then generate scripted responses for each area.  Here they are aggregated into one chart:

 

Here are links that take you to each of the posts in this series:

Sentiment Analysis Part 5: Reporting on Mood Survey Data

As part of my daily sentiment analysis I have also been logging the results of a mood survey several times daily.  The goal of collecting this data is to track how well I am doing (or think I am doing) in seven areas that relate to overall well being and then also to see how those scores change over time.  Here are my averages for the last several days of data.

10 is the best score for each category and 1 is the worst score.

DAILY MOOD SURVEY AVERAGES



A big part of collecting this data (the sentiment analysis + mood survey) is that I want to use these numbers to generate scripted responses that function as life coaching prompts.  For example, a human coach would see that if I am consistently answering the emailed sentiment analysis questions with negative language and my mood survey numbers are showing a deterioration in my scores over time, then I am probably needing to pay extra attention to some aspect of my life I have been neglecting that would help improve my perceptions/mood/sentiment.  In other words, I am hoping to supplement or even replace human coaching by tracking my numbers and reporting on them in a way that would be analogous to being a coach spending time to help me find out what should change in my habits or routines.  The ability to keep track of mood/sentiment data and report on it seems to me to be essential to self-coaching.  It’s one of the main things about having a fitbit that I find useful.  The sentiment tracking and mood surveys are an attempt to create a sort of fitbit for my brain.  I can see the trends and identify small areas that if improved would influence the overall numbers.

 

Here are links that take you to each of the posts in this series:

Sentiment Analysis Part 3: Survey Process Revised

I used to have a 750 words account, which I gave up mainly because I did not find the charts to be all that informative, although I enjoyed the habit of writing every day.  That website along with my interest in the topic of “Quantified Self” inspired me to start exploring the notion of tracking my sentiment (an introduction to my project is posted here).  Since I have started this project my interest has expanded into trying to collect data about my mood so that I can match it up with my positive/negative sentiments and also connect both mood and sentiment data to the exercise data being collected by my fitbit.  To that end I deployed a simple script (below) to the server that is running my sentiment analysis project that sends me a questionnaire every couple of hours:

Here (below) are two examples of the email I receive from the script.  I started including the images because it seemed to me that I am actually happier when I click on them rather than just clicking on a link.  The images are generated randomly from the site unsplash.it, so I never know what they will be:

The email I receive several times a day inviting me to log my mood
Two examples of the email I receive to log my mood

Clicking the image takes me to a google form with seven variables that I track.  Each variable is given a score on a 1-10 scale and the order that they are presented in is randomized, i.e. they appear in a different order each time I fill out the form.  I include a screenshot of the form here so that it’s clear how this process is working:

The seven areas I track as part of my mood
The seven areas I track as part of my mood log

I think it is important that I do the survey several times a day and then average the results because at some points in my day I might feel good about the “Eat Well” category, for example, and then later in the day not feel the same way.  After this survey has been running for a week or two I will start sharing charts that show how my rating of the seven categories fluctuates over time.

UPDATE:  revised the survey process to put the data directly into the database on the server rather than using Google Forms.  The process of transferring the data from a Google Form to the database was a bit of a hassle, and Google really intends such forms to connect to Google App engine, which isn’t ideal for me.  Including the text/code here for inserting the form data directly into the database.

Here are links that take you to each of the posts in this series:

Sentiment Analysis Part 4: Training the Bot to Respond

The main goal is to get the Sentiment Analyzer to respond to incoming messages based on your mood and also the content of what your messages say in unexpected ways.  So, for example, I don’t want to receive kitten picture responses and nothing else just because I am in a bad mood.  I would like the Sentiment Analyzer to say something relevant back to me, as a friend might do.  The main idea therefore is to train a chat bot on incoming responses to the questions being asked.

 

Thus far the bot will produce responses such as this (below) based on the data it has collected from the responses I have given to the questions I am receiving over the last week or so:

Obviously, this is not a very happy bot right now, but the idea is to key the questions being emailed and the responses to positive/negative values from the NLTK processor and then to use that data to give the interactions personality that is novel and fun.

Here are links that take you to each of the posts in this series:

 

Sentiment Analysis Part 3: Survey

I used to have a 750 words account, which I gave up mainly because I did not find the charts to be all that informative, although I enjoyed the habit of writing every day.  That website along with my interest in the topic of “Quantified Self” inspired me to start exploring the notion of tracking my sentiment (an introduction to my project is posted here).  Since I have started this project my interest has expanded into trying to collect data about my mood so that I can match it up with my positive/negative sentiments and also connect both mood and sentiment data to the exercise data being collected by my fitbit.  To that end I deployed a simple script (below) to the server that is running my sentiment analysis project that sends me a questionnaire every couple of hours:

Here (below) are two examples of the email I receive from the script.  I started including the images because it seemed to me that I am actually happier when I click on them rather than just clicking on a link.  The images are generated randomly from the site unsplash.it, so I never know what they will be:

The email I receive several times a day inviting me to log my mood
Two examples of the email I receive to log my mood

Clicking the image takes me to a google form with seven variables that I track.  Each variable is given a score on a 1-10 scale and the order that they are presented in is randomized, i.e. they appear in a different order each time I fill out the form.  I include a screenshot of the form here so that it’s clear how this process is working:

The seven areas I track as part of my mood
The seven areas I track as part of my mood log

I think it is important that I do the survey several times a day and then average the results because at some points in my day I might feel good about the “Eat Well” category, for example, and then later in the day not feel the same way.  After this survey has been running for a week or two I will start sharing charts that show how my rating of the seven categories fluctuates over time.

 

Here are links that take you to each of the posts in this series:

Connect KNIME to a .sqlite file on Windows

In this post I want to show how to load a large sqlite file into KNIME so that you can write queries on it much like how you can do using pandas in a Jupyter Notebook.  For the impatient, here is a link to a screencast that shows the steps below in a short video: http://screencast.com/t/upGxA9Yvqs

First you need to have a SQLite JDBC (jar file) which you can download from xerial here:  https://bitbucket.org/xerial/sqlite-jdbc/downloads

Once you have the jar file navigate to File > Preferences > KNIME > Databases and select add file (KNIME documentation here):

Add SQLite file to KNIME
Add SQLite jar file to KNIME

Once you have the jar file installed you can create a new workflow:

Create a new KNIME workflow
Create a new KNIME workflow

In the new workflow select “Database” and then move the SQLite connector to the canvas area:

SQLite Connector in KNIME
SQLite Connector in KNIME

Double click the SQLite connector to add a file to your path (click “Browse”)

Select the SQLite file to add to the path
Select the SQLite file to add to the path

Use the “database reader” tool to load the tables available as in the screenshot below.  In this case I am using the Dump of all comments on Reddit from May 2015 (which you can download from Kaggle here:  https://www.kaggle.com/reddit/reddit-comments-may-2015).  This file is about 30 GB of data.

Database Reader tool
Fetch Metadata using the Database Reader tool and write a query

Add in the “CSV writer” tool from the IO menu and run the query.

CSV Write output
CSV Write output

In later posts I will show this same set of steps using both RapidMiner and Alteryx for comparison.

Do I hate Mondays?

The table and pie chart below will automatically update several times a day based on my responses to the sentiment analysis emails I am sending myself (see explanation here).  Eventually I will have a clear answer to the question, “Do I have a case of the Mondays?”  🙂

Positive vs Negative Sentiment by Day of the Week
Weekday Pos Response Neg Response Neutral Response
Fri 9 13 9
Mon 13 11 9
Sat 9 11 5
Sun 5 8 6
Thu 9 8 7
Tue 12 21 7
Wed 9 11 4

Here is the overall happiness-o-meter as charted using the running total of positive vs negative responses displayed in the table above:

The following table shows the average of the percent of positive/neutral/negative content that is detected by sentiment analysis across all of my responses.  In other words, the percentage positive/neutral/negative that makes up each of my responses:

Likelihood of Response Type
percent_negative percent_neutral percent_positive total_observations
0.523 0.322 0.477 196

It should be noted that every response has some degree of positive/neutral/negative language, so overall the response is classified as either positive or negative based on which component predominates.  Here are the averages by date for each type of response:

Daily Averages by Response Type
Displaying 1-20 of 22 1 2 »
date pos neg neutral
08/02/2016 0.437 0.563 0.287
08/03/2016 0.488 0.512 0.293
08/04/2016 0.499 0.501 0.327
08/05/2016 0.477 0.523 0.367
08/06/2016 0.609 0.391 0.245
08/07/2016 0.399 0.601 0.336
08/08/2016 0.430 0.570 0.430
08/09/2016 0.554 0.446 0.280
08/10/2016 0.461 0.539 0.274
08/11/2016 0.526 0.474 0.304
08/12/2016 0.455 0.545 0.302
08/13/2016 0.458 0.542 0.395
08/14/2016 0.489 0.511 0.331
08/15/2016 0.649 0.351 0.266
08/16/2016 0.411 0.589 0.259
08/17/2016 0.421 0.579 0.230
08/18/2016 0.498 0.502 0.390
08/19/2016 0.448 0.552 0.417
08/20/2016 0.438 0.562 0.357
08/21/2016 0.484 0.516 0.308
Displaying 1-20 of 22 1 2 »

 

Here are links that take you to each of the posts in this series:

Sentiment Analysis Part 2: Getting Responses from NLTK via requests

In part 1 of my Sentiment Analysis series I showed how I am sending myself emailed questions periodically and then storing my responses in a database to process the sentiment values over time.  In this post I want to show how to get the responses from the database and send them to be processed using the requests library in python.

I am doing the sentiment analysis using the python’s NLTK 2.0 via an api call.  The advantage of doing it this way is that I don’t need to train classifiers myself since the authors of that module/library already did so using twitter data and movie reviews.  This means that if I expanded this project to include analysis on other people’s responses (i.e. other users than just me) we would all have our texts analyzed using a classifier that produces consistent categorizations of positive/negative for everyone rather than being skewed by having been trained on just my data/responses.  In other words, I might end up being so negative that everyone else seems really positive, which is not quite right.

The script to make the api call is very simple.  I start out by fetching the rows from the database where the responses are collected and then send them one at a time to be processed by the sentiment analyzer:

This script produces an output that shows the response author, the date and time of the response and the positive/negative score returned by NLTK 2.0.  The output of the above script looks like this:

These outputs are then inserted into a MySQL table to be used in later analytical steps the I will describe in future posts.

UPDATE:  Screenshot of the responses in a table in the database.  Now that I see it starting to come together I can think of a thousand fun possibilities for additional data points, e.g. the weather at the time of the response or device type that the response was written on (mobile, desktop, etc).

NLTK sentiment over time
NLTK sentiment values for the last four responses

 

Here are links that take you to each of the posts in this series:

Sentiment Analysis on Email Inbox: Questions

In my post introducing the project of doing sentiment analysis on my email inbox I mention that I am emailing myself questions to answer (to generate data for analysis) every two hours.  I am posting the current list of questions here but over time the list might expand to include more topics.  My script chooses from among these questions (below) randomly so that I will end up generating responses for a wide variety of topics.  Another outstanding task is to classify the questions so that I can track overall sentiment by type of topic.

Here are links that take you to each of the posts in this series:

Sentiment Analysis Questions


Do you own or use a desktop computer or a laptop?
Do you read newspapers?
Do you read? Has any particular book influenced you or left a life-changing impact?
Which are your favorite authors?
Do you see your father as the head of the family?
Do you smoke?
Do you want to have children? Why?
Have you ever been on a blind date?
Have you ever been on the wrong side of the law?
Have you ever been part of a not-for-profit organization and done volunteer work?
Have you ever been to a school or college reunion?
Have you ever had a narrow escape from death?
Have you ever taken a sabattical?
Have you ever undergone therapy of any sort?
How ambitious are you?
How do you like to celebrate your birthday?
How do you feel about divorce?
How do you like to celebrate the New Year?
How important is money to you?
How important is your family to you?
How is your relationship now with your siblings?
How long are you on the internet every day?
How many credit cards do you own?
How many lovers have you had so far?
How often do you meet your parents?
How old were you when you started dating
How patient are you?
How religious are you? Do you pray regularly?
How romantic are you?
How tall do you want your mate to be?
If a relative died and left you a million dollars, what would you do with the money?
If you could change one event in history what would it be?
If you could go on a vacation anywhere in the world where would it be? What stops you?
If you could know one historical figure very intimately, who would that be?
What are your political leanings like?
What did you do during the summers when you were growing up?
What do you consider the five biggest drawbacks of your personality?
What do you consider the five biggest strengths of your personality?
What do you consider your biggest achievement to date?
What do you do for a living? What would you rather do, if money was not a consideration?
What do your parents do?
What has been your most generous act of charity, whether in cash or kind, yet?
What is the most adventurous thing you have ever ever done?
What is the most expensive gift you have ever bought for someone?
What is the smartest thing you have ever done?
What is the movie you have loved the most and would watch again, given a choice? Why?
What is your favorite sport?
What is your idea of a perfect evening?
What is your relationship like now with your father?
What is your relationship like now with your mother?
What was it like, growing up with with your siblings?
What was your relationship like with your father when you were growing up?
What was your relationship like with your mother when you were growing up?
What would you do if your best friend disapproved of me?
What is the best compliment you have ever received?
What is the highest educational qualification you obtained?
What is the worst thing you ever did to a friend?
What is your earliest childhood memory?
When is your birthday?
When was the last time you cried
When was the last time you felt really proud of yourself?
When was the last time you really laughed?
When was the last time you told a lie?
Where you see yourself in five years time?
Which body part are you more likely to notice first in someone of the opposite sex?
Which celebrity do you admire?
Which has been your longest romantic relationship, so far?
Which is the largest loan you have ever taken?
Which is the largest purchase you have ever made?
Which is your favorite magazine?
Which personality traits do you want your partner to have? Which of these are an absolute must?
Which religion do you belong to?
Which was the last really impulsive thing you did? Are you an impulsive person?
Would you prefer to marry someone who belongs to the same religion as you?
Would you rather live in a large urban city, a small town or in the countryside?