MSc thesis project proposal
Unstructured Clinical Text as Source for Improving Surgery Duration Modelling
Project outside the university
Erasmus MCIn this project, the aim is to explore which pieces of text actually help predict how long a surgery will take and find ways to extract this relevant information. You’ll do this by extracting different kinds of features from textual data and then testing how informative they are.
Assignment
To pull useful information out of the free-text notes, you will work with a mix of natural language processing (NLP) techniques and input from clinical experts. You’ll start with preprocessing steps—like cleaning the text, removing repeated boilerplate phrases, and ensuring everything is properly de-identified. After that, you’ll try different ways of turning the text into features you can use in your models. This might include simple methods such as extracting keywords, looking at TF-IDF scores, or setting up rules to detect clinically important phrases like “difficult airway” or “previous complications.” You can also experiment with more advanced methods, such as word embeddings or pretrained clinical language models, to capture the deeper meaning behind the text. All of these steps will help you figure out which parts of the notes carry information that could influence how long a surgery takes. To be able to do this, we have textual data available of the last 10 years of performed surgeries.
Along the way, you’ll also check with clinicians—such as surgeons or anesthesiologists—to make sure that the patterns your models highlight make sense in real practice. It is even possible to observe and analyze surgeries and identify in practice what aspects have influence on surgery durations. By combining what you learn from the text with the structured data we already have, you will help towards building a more complete and accurate system for predicting surgery duration.
This project is conducted in collaboration with Cindy Pistorius (TU Delft, Erasmus MC) and Dr. Niki Ottenhof (Erasmus MC).
Requirements
Basic understanding of large language models (LLMs), experience with machine learning
Contact
dr.ir. Justin Dauwels
Signal Processing Systems Group
Department of Microelectronics
Last modified: 2025-12-11