Identifying Homeless Youth at-risk of Substance Use Disorder: Data Driven Insights for Policymakers.

Substance Use Disorder (SUD) is a devastating disease that leads to significant mental and behavioral impairments. Its negative effects damage the homeless youth population more severely (as compared to stably housed counterparts) because of their high-risk behaviors. To assist policymakers in devising effective and accurate long-term strategies to mitigate SUD, it is necessary to critically analyze environmental, psychological, and other factors associated with SUD among homeless youth. Unfortunately, there is no definitive data-driven study on analyzing factors associated with SUD among homeless youth. While there have been a few prior studies in the past, they (i) do not analyze variation in the associated factors for SUD with geographical heterogeneity in their studies; and (ii) only consider a few contributing factors to SUD in relatively small samples. This work aims to fill this gap by making the following three contributions: (i) we use a real-world dataset collected from ∼1,400 homeless youth (across six American states) to build accurate Machine Learning (ML) models for predicting the susceptibility of homeless youth to SUD; (ii) we find a representative set of factors associated with SUD among this population by analyzing feature importance values associated with our ML models; and (iii) we investigate the effect of geographical heterogeneity on the factors associated with SUD. Our results show that our system using adaptively boosted decision trees achieves the best predictive accuracy out of several algorithms on the SUD prediction task, achieving an Area Under the ROC Curve of 0.85. Further, among other things, we also find that both Post-Traumatic Stress Disorder (PTSD) and depression are very strongly associated with SUD among homeless youth because of their propensity to self-medicate to alleviate stress. This work is done in collaboration with social work scientists, who are currently evaluating the results for potential future deployment.

© Tabar None. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in '',


  • kdd2020-1.pdf

    size: 2.95 MB | mime_type: application/pdf | date: 2021-09-09


Work Title Identifying Homeless Youth at-risk of Substance Use Disorder: Data Driven Insights for Policymakers.
Open Access
  1. Maryam Tabar
  2. Heesoo Park
  3. Stephanie Winkler
  4. Dongwon Lee
  5. Anamika Barman-Adhikari
  6. Amulya Yadav
License In Copyright (Rights Reserved)
Work Type Article
  1. ACM
Publication Date July 6, 2020
Publisher Identifier (DOI)
  1. 10.1145/3394486.3403360
  1. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Deposited September 09, 2021




This resource is currently not in any collection.

Work History

Version 1

  • Created
  • Added kdd2020-1.pdf
  • Added Creator Maryam Tabar
  • Added Creator Heesoo Park
  • Added Creator Stephanie Winkler
  • Added Creator Dongwon Lee
  • Added Creator Anamika Barman-Adhikari
  • Added Creator Amulya Yadav
  • Published
  • Updated
  • Updated