NLP for Clinical Notes: Extracting Structure from the Unstructured

Clinical discharge letters are among the richest sources of patient information in any health system — and among the least accessible to computational analysis. Dense, abbreviated, inconsistent, and written in highly specialised medical language, they resist standard NLP approaches. This is our account of building a pipeline that works.

The Italian Clinical Notes Challenge

Italian clinical discharge letters present a particularly challenging NLP target. Written by clinicians under time pressure, they combine formal medical terminology, colloquial abbreviations, Latin phrases, numerical lab values, and patient-specific shorthand into documents that vary enormously in structure and length. A letter for a cardiac patient might run to four pages with structured sections; one for an emergency admission might be a single dense paragraph.

The low-resource challenge compounds this. While English biomedical NLP benefits from large annotated corpora like MIMIC-III and i2b2, Italian clinical text is largely proprietary, scattered across hospital systems, and rarely annotated. Training robust models requires creative approaches to data scarcity.

Named Entity Recognition for Medical Concepts

The first task in our pipeline is named entity recognition (NER) — identifying spans of text that refer to clinical concepts such as diagnoses, medications, procedures, and anatomical locations. We address this with a transformer model fine-tuned on a proprietary annotated corpus developed in collaboration with ICS Maugeri's clinical informatics team.

Our annotation schema covers seven entity types: diagnoses, symptoms, medications, dosages, procedures, anatomical sites, and clinical findings. Training data totals approximately 3,200 discharge letters with full span-level annotation — modest by English NLP standards, but sufficient for robust performance when combined with pre-training on Italian biomedical text.

We achieve F1 scores of 0.88 for diagnoses, 0.91 for medications, and 0.84 for procedures on our held-out test set. Performance drops to 0.76 for clinical findings — a heterogeneous category that includes laboratory abnormalities, imaging findings, and physiological measurements — reflecting the category's greater linguistic variability.

"The hardest part of clinical NLP is not the model — it is the annotation. Getting clinicians to agree on what counts as a 'diagnosis' versus a 'finding' in free text requires months of inter-annotator agreement work."
— Lucia Sacchi, Scientific Consultant & Co-Founder

Temporal Reasoning in Discharge Letters

Identifying what happened is only half the problem. Clinical decision support also requires knowing when things happened and in what order — whether a medication was prescribed before or after a diagnosis, whether a symptom was present on admission or developed during hospitalisation, whether a finding is acute or chronic.

Temporal reasoning in clinical text is hard because clinicians use time in complex ways. "The patient, who has suffered from hypertension for twenty years, presented acutely with chest pain" encodes three temporal relations in a single sentence — and the model must disentangle them. We use a combination of a temporal expression normaliser (adapted from SUTime for Italian) and a neural relation classifier that identifies the temporal links between extracted entities and time expressions.

Data analysis and timeline reconstruction — Reconstructed patient timeline from a cardiac discharge letter. Each event is positioned relative to admission date, with confidence intervals derived from the model's temporal expression extraction.

Low-Resource Italian NLP

The scarcity of annotated Italian clinical text pushed us toward several strategies to make the most of limited supervision. Cross-lingual transfer from English BERT models (using mBERT and XLM-RoBERTa as base models) proved effective — models pre-trained on multilingual text bring useful biomedical knowledge from English annotations even when fine-tuned on Italian text. Domain-adaptive pre-training on unannotated Italian clinical text, using masked language modelling, further narrows the gap.

We also developed a data augmentation strategy specific to clinical text. By systematically replacing entity spans with semantically compatible alternatives from a medical ontology (SNOMED-CT Italian extension), we generate synthetic training examples that expose the model to a broader vocabulary while preserving document-level coherence.

Key Takeaways

Italian clinical NER achieves F1 of 0.88+ for diagnoses and medications on held-out test data
Temporal reasoning enables reconstruction of patient event timelines from discharge letters
Cross-lingual transfer from English biomedical BERT models provides a strong low-resource baseline
Domain-adaptive pre-training on unannotated Italian clinical text significantly boosts performance
Annotation quality is the primary bottleneck — invest in inter-annotator agreement protocols from the start

Integration with ICS Maugeri

The pipeline is deployed at ICS Maugeri as part of a clinical information extraction system that processes new discharge letters within minutes of their creation. Extracted entities are structured into a patient summary view available to treating physicians, with links back to the source text spans for verification. The system processes approximately 200 letters per day across ICS Maugeri's seven Italian facilities.

Physician feedback has been positive, with particular appreciation for the medication extraction capability — reconciling medication lists across multiple hospitalisation episodes is a time-consuming task that the system automates reliably. The temporal reasoning module, while technically impressive, receives more mixed feedback: clinicians appreciate the timeline view but are cautious about trusting fine-grained temporal assignments without manual verification.

This is a recurring theme in clinical AI deployment: the features that are hardest to build are often not the ones clinicians value most. Continuous user feedback is as important as model performance metrics, and we have built a structured feedback loop into the ICS Maugeri deployment that feeds directly into our model retraining pipeline.

The Italian Clinical Notes Challenge

Named Entity Recognition for Medical Concepts

Temporal Reasoning in Discharge Letters

Low-Resource Italian NLP

Key Takeaways

Integration with ICS Maugeri

More from Bilobe

Explainable AI in Clinical Decision Support

ViraLingo: Predicting Viral Variants with Pan-Viral LLMs

GDPR Meets AI: Data Protection by Design