Machine Learning Categorization Applications in Dealing with Email Overload

This paper describes classification criteria for machine learning models to solve the problem of email overload common for personal and professional mail users. In it I look at four different machine learning models, those being Multinomial Naïve Bayes Classifier, Random Forest, K Nearest Neighbor, and Decision Tree regarding their performance in solving each category of mail classifications. Each classification exists to effectively sort mail to reduce user time wasted on their inbox. These categories are classified as primary, secondary, social, promotional, and spam. Each of these are explained in more detail in the research design under classification criteria. Overall findings show that Random Forest preforms best with a 96% weighted average F1 score. While K Nearest Neighbor performs best for recall values, its overall performance with precisions scores is noticeably worse than Random Forest. Promotional mail, which had around 12,000 items had the best F1 score at 99% whereas spam preformed the worst with a 74% F1 score. Results can be likely be improved upon by increasing the total number of spam mail items, and by using deep learning solutions.

Files

Metadata

Work Title Machine Learning Categorization Applications in Dealing with Email Overload
Access
Open Access
Creators
  1. Robert Masters
Keyword
  1. Natural Language Processing (NLP)
  2. Machine Learning
  3. Email Classification
  4. Multiclassification
  5. Spam
License CC BY 4.0 (Attribution)
Work Type Masters Culminating Experience
Publication Date December 4, 2022
DOI doi:10.26207/aqwc-nk52
Deposited January 03, 2023

Versions

Analytics

Collections

Work History

Version 1
published

  • Created
  • Updated
  • Updated Acknowledgments Show Changes
    Acknowledgments
    • Parag Pendharkar
  • Added Creator Robert Masters
  • Added Creator Emily Mross
  • Added Masters_Robert.pdf
  • Updated License Show Changes
    License
    • https://creativecommons.org/licenses/by/4.0/
  • Published
  • Updated

Version 2
published

  • Created
  • Deleted Creator Emily Mross
  • Published
  • Updated Publication Date Show Changes
    Publication Date
    • 2022
    • 2022-12-04
  • Updated Acknowledgments Show Changes
    Acknowledgments
    • Parag Pendharkar
  • Updated