Machine Learning Categorization Applications in Dealing with Email Overload
This paper describes classification criteria for machine learning models to solve the problem of email overload common for personal and professional mail users. In it I look at four different machine learning models, those being Multinomial Naïve Bayes Classifier, Random Forest, K Nearest Neighbor, and Decision Tree regarding their performance in solving each category of mail classifications. Each classification exists to effectively sort mail to reduce user time wasted on their inbox. These categories are classified as primary, secondary, social, promotional, and spam. Each of these are explained in more detail in the research design under classification criteria. Overall findings show that Random Forest preforms best with a 96% weighted average F1 score. While K Nearest Neighbor performs best for recall values, its overall performance with precisions scores is noticeably worse than Random Forest. Promotional mail, which had around 12,000 items had the best F1 score at 99% whereas spam preformed the worst with a 74% F1 score. Results can be likely be improved upon by increasing the total number of spam mail items, and by using deep learning solutions.
|Machine Learning Categorization Applications in Dealing with Email Overload
|CC BY 4.0 (Attribution)
|Masters Culminating Experience
|December 4, 2022
|January 03, 2023