October 20, 2014

How to train your inbox using Microsoft Azure Machine Learning (part two)

Theta

In this three part-series, Theta Software’s product architect and lead consultant Jim Taylor takes a closer look at Microsoft Azure Machine Learning. New here? read part one first: What is machine learning and how might it be useful?

Could I train my inbox using Microsoft Azure Machine Learning?

When I’m looking at new technologies or I’m trying to get to grips with a new concept I like to get my hands dirty and try things out for myself, so I set about trying to find a problem and solution that could make use of a predictive analysis experiment.

The problem...

At Theta we have a number of people based at various locations, working in our two offices, working from home, working at client sites etc.

Most people are in the habit of sending what I call “whereabouts emails” with subject lines like:

<out> An hour or so

<WFH> Sick today

<WFH>

<late>in the office from 9.45am

Out-ClientName

WFH

<in later> In after lunch. At XXX from 11:30am to 1:30pm then office

Bob on leave until 3rd

Out for lunch

As you’d expect I have a rule setup for this – most people use the convention of using angle brackets to denote WFH (working from home), OUT etc and it’s pretty much never sent directly to me.

‍

Apart from the odd false positive and emails that slip through the rule this works well enough most of the time

So it turns out I have a readily available source of training and test data (my inbox and whereabouts folder) and a problem looking for a solution. Can I make an experiment that predicts whether an email is a “whereabouts email” without explicitly defining rules? Can I train a model by using the emails in my inbox as training and test data – by providing details of all emails and those that have ended up in my whereabouts folder?

Designing the experiment

The first step is to decide what data to use in the experiment. What attributes of each email in my inbox and other folders would provide useful indicators?

I decided to try the following:

Is in whereabouts folder – this is what we are predicting so we provide this to train the model.
Has attachments – Whereabouts emails tend not to have attachments
Sent direct - Whereabouts emails tend to be sent to groups of people rather that direct
May contain a time – The subject has text which contains something which may be interpreted as a time e.g. 1pm, 12:30
Is reply or forward - Whereabouts emails tend not to be a reply or forwarded email
Received day of week – Interested to see if this is a factor
Received hour – Interested to see if there is a pattern here (I tend to see these emails arrive in the morning and evening)
Subject word count – Tends to be low
Body word count – Tends to be low
Sender domain – Usually from the company domain
Has CC – Whereabouts emails rarely contain a CC
Importance – Interested to see if this is a factor
Body format – Interested to see if this is a factor
Special character count – Included to see if this can help given the convention of using angle brackets but not too specific as to lead the experiment too much.
Subject number count – Included to see whether this is a factor

So I created a console application using Outlook Office automation to iterate over all folders and emails in my inbox and produce a csv output.

The application source code can be found on GitHub.

The resulting dataset (I had 2000+ rows) can be uploaded to Azure Machine Learning as a dataset.

‍

In my next post, I'll go through step by step how to run this experiment, evaluate it and publish as a web service. Go to part three of the Microsoft Azure Machine Learning series.

Go to part three of the Microsoft Azure Machine Learning series

Part 3 here

About

Solutions

Careers

Dynamics 365

D365 Business Central Regional Practice Lead

Digital

Technical Engagement Lead

Technologies

Customers

News & Blogs

Packages

Contact us

October 20, 2014

How to train your inbox using Microsoft Azure Machine Learning (part two)

Theta

In this three part-series, Theta Software’s product architect and lead consultant Jim Taylor takes a closer look at Microsoft Azure Machine Learning. New here? read part one first: What is machine learning and how might it be useful?

Could I train my inbox using Microsoft Azure Machine Learning?

The problem...

Designing the experiment

Go to part three of the Microsoft Azure Machine Learning series

About

About

Our People

Giving Back

Solutions

Solutions

Careers

Latest opportunities

Dynamics 365

D365 Business Central Regional Practice Lead

Digital

Technical Engagement Lead

Technologies

Technologies

We Work With

Technologies we use

Our Products

Products developed by Theta

Customers

Customers

Customers

Case Studies

News & Blogs

Featured News

Theta News

Watch Now: Internship Wrap Up With Jan and Xinxin

Tech

GenAI And the Future of Mobile Apps

Theta News

Watch Now: Celebrating Our 30th Anniversary Milestone

Packages

Packages

Data & Insights

Digital

Cyber Security

Dynamics 365

Contact us

Contact Us

October 20, 2014

How to train your inbox using Microsoft Azure Machine Learning (part two)

Theta

In this three part-series, Theta Software’s product architect and lead consultant Jim Taylor takes a closer look at Microsoft Azure Machine Learning. New here? read part one first: What is machine learning and how might it be useful?

Could I train my inbox using Microsoft Azure Machine Learning?

The problem...

Designing the experiment

Go to part three of the Microsoft Azure Machine Learning series

Related Posts