Fine-Tuned BERT Models for Medical Entity Extraction

The exponential growth of unstructured clinical data poses significant challenges for extracting meaningful insights essential for patient care, research, and operational efficiency. This study investigates the application of transformer-based language models specifically fine-tuned BERT variants for Medical Named Entity Recognition (NER), with a focus on extracting critical entities such as diseases, medications, and procedures from biomedical text. The research explores the effectiveness of domain-adapted models like BioBERT, ClinicalBERT, and PubMedBERT, emphasizing their superior contextual understanding and performance over general-purpose models. A robust experimental design leveraging the PubMed 200k RCT dataset is employed, with model architectures ranging from traditional TF-IDF with Random Forest to advanced token-based, character-based, and hybrid Bidirectional LSTM models. Through a comprehensive evaluation using precision, recall, and F1-score, the hybrid model demonstrated the highest accuracy and robustness, particularly in handling complex and ambiguous sentence structures. Integration with medical ontologies such as SNOMED CT and RxNorm further enhanced model interpretability and interoperability. The paper also addresses ethical considerations, including bias mitigation, privacy, and regulatory compliance. Future directions include the adoption of multimodal data, federated learning, explainable AI, and real-time adaptive models. The findings underscore the transformative potential of fine-tuned BERT models in structuring clinical narratives, ultimately contributing to safer, more efficient, and data-driven healthcare systems.

Files

Metadata

Work Title Fine-Tuned BERT Models for Medical Entity Extraction
Access
Open Access
Creators
  1. Mohammad Yusuf Qidwai
Keyword
  1. Medical Named Entity Recognition (NER)
  2. BioBERT
  3. ClinicalBERT
  4. PubMedBERT
  5. Transformer Models
  6. Biomedical NLP
  7. Healthcare AI
  8. LLMs
License CC BY 4.0 (Attribution)
Work Type Masters Culminating Experience
Sub Work Type Scholarly Paper/Essay (MA/MS)
Program Information Systems
Degree Master of Science
Acknowledgments
  1. Girish Subramanian
Publisher
  1. ScholarSphere
Publication Date April 2025
DOI doi:10.26207/4q6x-3575
Deposited April 23, 2025

Versions

Analytics

Collections

This resource is currently not in any collection.

Work History

Version 1
published

  • Created
  • Updated
  • Updated Keyword, Degree, Program, and 3 more Show Changes
    Keyword
    • Medical Named Entity Recognition (NER), BioBERT, ClinicalBERT, PubMedBERT, Transformer Models, Biomedical NLP, Healthcare AI, LLMs
    Degree
    • Master of Science
    Program
    • Information Systems
    Description
    • The exponential growth of unstructured clinical data poses significant challenges for extracting meaningful insights essential for patient care, research, and operational efficiency. This study investigates the application of transformer-based language models specifically fine-tuned BERT variants for Medical Named Entity Recognition (NER), with a focus on extracting critical entities such as diseases, medications, and procedures from biomedical text. The research explores the effectiveness of domain-adapted models like BioBERT, ClinicalBERT, and PubMedBERT, emphasizing their superior contextual understanding and performance over general-purpose models. A robust experimental design leveraging the PubMed 200k RCT dataset is employed, with model architectures ranging from traditional TF-IDF with Random Forest to advanced token-based, character-based, and hybrid Bidirectional LSTM models. Through a comprehensive evaluation using precision, recall, and F1-score, the hybrid model demonstrated the highest accuracy and robustness, particularly in handling complex and ambiguous sentence structures. Integration with medical ontologies such as SNOMED CT and RxNorm further enhanced model interpretability and interoperability. The paper also addresses ethical considerations, including bias mitigation, privacy, and regulatory compliance. Future directions include the adoption of multimodal data, federated learning, explainable AI, and real-time adaptive models. The findings underscore the transformative potential of fine-tuned BERT models in structuring clinical narratives, ultimately contributing to safer, more efficient, and data-driven healthcare systems.
    Sub Work Type
    • Scholarly Paper/Essay (MA/MS)
    Publication Date
    • 2025-04
  • Updated Acknowledgments Show Changes
    Acknowledgments
    • Girish Subramanian
  • Added Creator Mohammad Yusuf Qidwai
  • Added Creator Emily Mross
  • Added Qidwai_2C_Mohammad_Yusuf (1).pdf
  • Updated License Show Changes
    License
    • https://creativecommons.org/licenses/by/4.0/
  • Published Publisher Show Changes
    Publisher
    • ScholarSphere
  • Updated
  • Updated
  • Deleted Creator Emily Mross