Maximizing Memory Efficiency in Clinical Data Management: Worker Self-Awareness and Pool Management for Rapid Release

About the Client
Our client is a leading imaging informatics company based in the USA, specializing in comprehensive tissue-to-human imaging, analysis, and data management solutions. They serve pharmaceutical and biotech clients across all stages of the drug development pipeline.
The client supports its operations with two flagship platforms:
A multi-modality post-processing suite for imaging data (e.g., SPECT, CT, PET, MR).
A web-based platform for managing and reporting medical imaging data and metadata.
The Challenges
One of our client's most important products uses a job queue system to manage background tasks efficiently. However, we identified memory-related issues, specifically with job workers that were not releasing memory after completing certain processes. This behavior caused the system's memory usage to accumulate over time, forcing the container to swap memory and eventually become unresponsive.
After reviewing some of the most significant memory spikes, we discovered that certain reports heavily consumed memory. When processing all the data from a project, a single execution of these reports could result in over 3GB of memory usage. The main issue was that this memory was not being released after the report was generated.
Memory usage rose rapidly and consistently over a short period of time.

The Solution
We aimed at improving the current job queue system by introducing two key features:
Worker self-awareness: to retire when excessive memory usage is detected.
Worker pool management: to ensure the correct number of workers for each type are always running.

A worker would not be retired while actively processing a task. There would be no new operational limit imposed, and it would continue to normally function.
The Outcome
After the final version of our proposed solution to the memory leaking issue was released into production, we identified important improvements:
The job queue memory usage remained under 25GB.
Memory is now quickly released, and it no longer gets to the previous usage values, which would generally exceed 44GB. As a consequence, users no longer need to restart the system every 15 days to release memory usage.
