Trustworthiness and ethics in data analysis: the physicists approach to AI in medicine

Andrea Chincarini


1 Data analysis in modern medicine

Evidence-based medicine is becoming an increasingly important part of modern healthcare, with data-oriented approaches replacing tradition- and anecdote-based methods. By analyzing large amounts of patient data, doctors and researchers are able to make more informed decisions about the best way to treat patients. This shift towards evidence-based care has already led to improved outcomes and reduced costs in many cases. These data analysis practices today fall under the umbrella term “Artificial Intelligence” (AI), a term so important that it is now commonly used to describe all types of data-driven analyses that may affect patient management. Analysis can be statistical in nature and can be performed using both classical and deep machine learning techniques.

Nowadays AI in healthcare is already used to determine patients’ status, assess CT or MRI scans, or identify high-risk populations for population health. The AI industry is estimated to tap into a market share worth several billions euros and the last 10 years have seen an impressive number of scientific studies and investments in this field. Image analysis is one of the most prominent uses of AI and there are now several researches and growing businesses based on it. However, although artificial intelligence is proven to be valuable in various areas such as healthcare, it cannot always be trusted. Many are the risks associated with implementing AI into sensitive areas such as automatic diagnosis, where incorrect calculations could lead to serious health complications for patients. In other words, caution is still necessary when using AI in medicine. As a result of experience, batch effects (a synonym of the characteristics, quality and sample selection typical of a clinical center or scanner manufacturer) and acquisition protocol issues are still open challenges that can be difficult to overcome when attempting to use AI in automatic diagnosis.

2 Nuclear Medicine neuroimaging

Nuclear Medicine (NM) is a branch of medicine that uses radiation to diagnose and treat disease. Nuclear medicine is used in a wide range of diseases, including cancer, heart disease, and neurological disorders. It is also applied to image the body’s internal organs and to assess the effectiveness of treatments. One of the key features is the ability to tag a chemical compound with a radioactive isotope, which is then injected in the patient. Scanners are sensitive to gamma rays emitted by the decaying isotope and translates the compound spatial concentration in the body into a 3D image, which is then interpreted by the physician.

There is a large variety of chemical compounds as well as isotopes, and many more are being engineered. Each compound can be tailored to mark a specific biochemical process, so that NM imaging can assess in vivo dynamics of virtually all biological mechanisms.

PET (Positron Emission Tomography) and SPECT (Single Photon Emission Computed Tomography) are both NM imaging techniques used to produce detailed images of the inside of the body. The main difference between the two is the type of radiation exploited to create the images. PET uses positrons, which are particles with the same mass as electrons but with a positive charge, while SPECT uses gamma rays. Because of this difference, the images produced by the two techniques also have some key characteristics. PET images tend to have higher spatial resolution and more detailed anatomic information, while SPECT images are better at showing functional information, such as blood flow or metabolic activity.

When applied to brain imaging, NM scans are the key to understanding brain functions, aging and various disease processes, particularly those which are characterized by a molecular signature. Under this category there is the large family of neurodegenerative diseases (i.e. Alzheimer’s disease, Parkinson’s disease and all motor and dementia-type progressive diseases), and it is no surprise that nowadays these are diagnosed and characterized by NM neuroimaging.

This requires not only a comprehensive understanding of the data and the techniques used for analysis but also the capability to translate complex technical concepts into clear and actionable insights for the physicians. Our proficiency in data analysis has enabled us to develop a clear and efficient approach to data analysis that caters to the needs of nuclear medicine physicians and we are continuously striving to enhance our approach and deliver more accurate and actionable results to them.

By utilizing our extensive experience in data analysis, we have come to understand the importance of both intricate analysis and accessible results for nuclear medicine physicians. We acknowledge that providing thorough and precise analysis is not enough, it is also vital to present the results in a way that is easily understood and acted upon by the physicians. Striking the perfect balance between the intricacy of the analysis and the simplicity of the results is crucial for nuclear medicine physicians to be able to understand and utilize the data effectively.

3 A multifaceted, independent approach

A key to precision in physics often results from the possibility of repeating measurements and exploiting statistical errors’ independence. We apply this basic concept of statistics to medical data analysis, but with a twist. In other words, rather than relying only on one method, we implement at least three approaches for every analysis and score them against normative data.

Studies have shown that there is a sizable portion of cases, especially those that are considered borderline by clinicians, that can be misdiagnosed by a single clinician, even one who is highly experienced. This has led our team to develop a comprehensive, automated analysis process that utilizes both AI and non-AI algorithms in order to achieve more accurate results. This framework consists of:

• at least one standard, non-AI analysis method based on solid clinical evidence and rooted in the consolidated practice;
• a fully data-driven approach using AI, radiomics and sophisticated algorithms;
• an algorithm that encapsulates clinical or physiological models as part of the analysis.

The model is then validated on a multi-center dataset, which contains approximately a thousand clinically validated cases. This provides the necessary normative data to contrast the new case and ensures that the model’s performance is consistent across different centers and populations. By using a multi-center dataset, we are able to evaluate the model’s performance in a real-world setting and ensure that it is generalizable to a wide range of patients. Additionally, using clinically validated cases in the validation dataset allows us to be confident that the model’s output is in agreement with the ground truth and thus has a high level of accuracy. The software is engineered so that new cases are constantly uploaded and can be validated a posteriori by a set of expert clinicians in a consensus round. This allows the model to continuously learn and improve its performance over time. This validation process is essential to ensure that the model is reliable and can be used to make important medical decisions.

Finally, these analysis steps are combined into a comprehensive model and graphical representation to ensure that any conclusion is robust, trustworthy, and ethical.

We physicists often remind ourselves that “When you can measure what you are speaking about, and express it in numbers, you know something about it”. By introducing this physics approach to AI in medicine, we strive to bridge the gap between traditional and modern methods. This allows us to create a powerful and more complete picture of a patient’s condition while maintaining professional integrity. We are convinced that by using this framework we can provide more accurate, reliable and meaningful results for our patients.

The key ingredient is to use the algorithm not to deliver a definite answer in terms of patient condition or disease probability, but to provide robust and reliable quantification.

Quantification is the mean to leverage standard and more sophisticated AI-driven approaches without overstepping the boundary between the clinician expertise and his/her duty to interpret and deliver the diagnosis:

• quantification enables practitioners to visualize and measure data more accurately, allowing for greater precision in diagnosis and treatment decisions;
• it allows physicians to identify subtle differences between healthy and unhealthy tissue, leading to better disease management and earlier intervention;
• it can help reduce radiation exposure by enabling doctors to limit the amount of time they spend performing scans;
• with quantification, it is possible to track changes over time, making long-term monitoring easier;
• quantification also facilitates patient education about their condition and helps improve communication between healthcare providers and patients.

The use of more than one analysis tool is actually more important when the quantification methods disagree than when they agree. In fact, since each quantification method is based on its own independent approach, the disagreement is a significant indication of analysis issues such as: data quality, patient peculiar anatomy or a non-standard presentation of the disease pattern. By using multiple analysis tools, we are able to cross-reference the results and identify any discrepancies, which can then be investigated and resolved. This approach allows us to improve the accuracy and reliability of our analysis, and ultimately leads to better patient outcomes. Furthermore, the use of multiple analysis tools also helps us to identify and account for any potential biases or limitations in a single analysis method, and by comparing the results, it allows us to have a more complete understanding of the patient’s condition and make more informed decisions.

Therefore we are not competing with mainstream algorithms and robots that showcase AI-based algorithms to replace human physicians, but rather rely on interactions between algorithms and clinicians. We want to engage humans with layers of additional information, which are both informative and reassuring at the same time. The goal is to improve the clinician’s knowledge and confidence, which helps him/her to perform better in the future.

4 Data flow

Hence, we decided to structure an analysis based on a three-tiered client-server (hub-cloud) approach, where the client is a lightweight, platform-independent and modern software installed on the clinician’s hardware, and the server side is primarily a collection of cloud-based services orchestrated by a central hub. The client handles user identification, data input, management, and result display. Additionally, it has the duty of anonymizing and encrypting medical data

An expert in medical research (a doctor or professional in general HCP, healthcare professional) who uses the platform in a hospital undertakes a simple and intuitive process to access the analysis. The clinician authenticates on the platform and updates the consent from the patient undergoing the clinical examination for the processing of research data, confirming the acquisition within the client.

The clinician then selects the examination (analysis module), which determines the processing method. The exam imaging data consists of a DICOM format, as well as the technical and sensitive information related to the exam. These pieces of information are selected, verified, and converted with encryption before being uploaded. However DICOM is unsuitable for privacy reasons because – although it does offer security measures – the data stored in DICOM files can be seen by anyone with access to the file. Furthermore, if the file gets hacked or falls into the wrong hands, all of the patients’ personal and medical information would be exposed, putting their privacy at risk.

Therefore the DICOM file format is pseudonymized and converted to NIFTI by using a 256-bit public-private key system. NIFTI is a secure file format for medical images which protects the patient’s personal data. It ensures that any sensitive information is not accessible even if the file is hacked or falls into the wrong hands. NIFTI is more compact than DICOM, easy to understand and used by all modern analysis softwares.

The client then streams the data package to the server, which starts the needed calculation and triggers the cloud services that are fed with anonymized data only. The results are then collected and streamed back to the clients, which decrypts the information and displays the results in clear text.

By processing the data in the selected analysis module, the algorithm makes comparisons with standard databases (i.e. validated clinical results and retrospective analyses), as well as using specific calculation methods for the various types of exams.

The central hub organizes accounting operations (authentication, number, and type of processing required by users or sites) and manages a backup database of processed results and encrypted meta-data to provide disaster-recovery services.

In addition to accessing the unencrypted result, the client also provides clear visual statistics relevant to each examination so one can easily understand the data. Results are stored both on the user’s workstation and in a local database (also encrypted) and can be printed or exported in pdf and xlsx format.

We have taken great care to ensure the privacy of all users. Adherence to GDPR regulations is paramount, and all data sets are encrypted both in transit and at rest. We have also implemented a system of audit logs which track all data access and processing requests and provide a clear audit trail. Our protocol is designed to ensure the highest level of data security, thereby protecting our clients’ sensitive information.

5 Uses cases

At the time of this writing, our team has implemented four analysis pipelines, two of which are related to PET imaging. This includes the analysis of FDG PET for neurodegeneration, which is a powerful tool for detecting, diagnosing, and monitoring diseases such as Alzheimer’s Disease and other forms of dementia.

FDG PET is a tool used to examine brain metabolism. It involves injecting a special sugar called Fluorodeoxyglucose (FDG) into the bloodstream, and then using a scanner to track how it is metabolized in the brain. This allows doctors to detect changes that may indicate diseases like Alzheimer’s or dementia. By understanding brain metabolism, they can diagnose illnesses faster and more accurately. We developed an innovative way to track and quantify the amount of glucose uptake in certain regions of the brain which can be used to monitor changes over time. Additionally, we have developed a specialized imaging pipeline that allows us to measure cortical thickness, white matter integrity, and cerebral perfusion.

In this first example (see fig. 1), we assessed the diagnostic accuracy of visual assessment of FDG-PET scans in distinguishing between two groups of patients: those with prodromal Alzheimer’s Disease (MCI-AD) and those with mild cognitive impairment due to dementia with Lewy bodies (MCI-DLB). The study used six expert readers who were blind to the diagnosis. The readers were provided with maps obtained by univariate single-subject voxel-based analysis (VBA) and individual odds ratio (OR) plots obtained by the volumetric regions of interest (VROI) semiquantitative analysis. The results showed that the mean diagnostic accuracy of visual assessment was 76.8%, and it did not significantly benefit from adding the univariate VBA map reading (77.4%). However, the VROI-derived OR plot reading significantly increased both accuracy (89.7%) and inter-rater reliability. The conclusion was that conventional visual reading of FDG-PET is moderately accurate in distinguishing between MCI-DLB and MCI-AD, and it is not significantly improved by univariate single-subject VBA but by a VROI analysis built on macro-regions, allowing for high accuracy independent of reader skills. The accuracy of FDG-PET at the individual level in MCI-LB versus MCI-AD has been significantly improved with a stepwise approach: from visual to semi-quantitative analysis. Our team has successfully implemented this approach and seen a dramatic accuracy increase with respect to follow-up clinical end points. This accuracy increase was achieved through a decrease in manual labor, improved accuracy in pattern recognition, and the efficient integration of multiple algorithms. This evidence suggests that our system is capable of performing nuclear medicine neuroimaging quantification with greater accuracy than ever before seen.

In the same disease context, our team has successfully implemented an amyloid PET imaging pipeline for neurodegeneration to quantify the brain amyloid load in patients with Alzheimer’s Disease or other forms of dementia. This technology can also be used to monitor the efficacy of drug treatments over time, and our team is able to provide accurate and reliable quantification for these types of tests. Additionally, we have integrated an AI-based system for automated quantification of PET scans, which can help reduce the burden of manual interpretation and increase accuracy and reliability. This system is able to detect patterns and identify areas of interest that would have otherwise been difficult to spot manually, resulting in more accurate and reliable results. In fig. 2, we show a typical case of a negative, borderline and positive brain amyloid burden. These are the kind of images that are processed by the amyloid-PET pipeline to deliver regional and whole-brain quantification of the amyloid protein accumulation in the brain.

Besides nuclear imaging analysis, MRI scans are used to study the medial temporal lobe atrophy, which is essential for the medical evaluation of neurological diseases and for differentiating between them. In fig. 3 we show the saliency map superimposed on the MRI reference atlas. The map is the outcome of a radiomics and machine learning algorithm that translates and rates the subtle intensity information in the subject MRI into an atrophy index, which directly relates to the probability of developing an AD-type dementia.

Our team is also experienced in using nuclear imaging analysis to study the progression of Parkinson’s Disease. We have successfully implemented a scan pipeline that can quantify dopamine transporter densities in patients with Parkinsonian syndromes. This helps measure disease severity and track treatment efficacy over time. Per design, we have integrated multiple algorithms (both AI and non-AI based) for quantification, which reduces manual interpretation and improves accuracy and reliability. Our system is capable of detecting patterns and identifying areas of interest more efficiently than traditional methods, as well as more robustly than any single AI algorithm.

Our software provides a tailored experience for clinicians that combines the best of both worlds – human insight and quantification. In addition, we can improve on the clinician experience by providing validated examples nearest to the case under study. This offers up-to-date information that is not only accurate but it also safeguards against misreading or misinterpretation, which can lead to medical errors. Using both standard and machine learning techniques, we can better detect small deviations in neurological diseases, as well as monitor how they progress.

As we continue to develop the system, we plan to add new analyses and features that will open up specific opportunities for neuro-oncology and other brain diseases. This will give us a more holistic approach to neuroimaging quantification, allowing us to better understand the progression of neurological diseases and to create more effective treatments. Ultimately, our goal is to create a comprehensive system that can reliably quantify neurological diseases and provide valuable insights into the progression of these diseases.

6 Ethics and human-AI interaction

As a team of physicists with expertise in medical imaging, we recognize the importance of ethics and trustworthiness when conducting data analysis. We understand that algorithms are only as reliable as the data and assumptions that generate them. While we strive to ensure that the AI algorithms we use are based on sound and reliable data, with transparent and verifiable parameters, human-AI interaction must be carefully and ethically monitored.

One of the main advantages is the ability to combine the strengths of both humans and AI. While AI can process large amounts of data quickly and accurately, humans have the ability to understand context and make judgments based on experience and intuition. By working together, human radiologists and AI can improve the accuracy and speed of medical images analysis. Additionally, the use of AI can help to reduce the workload of radiologists and allow them to focus on the most complex cases. This can lead to improved productivity and efficiency in the diagnostic process. Another advantage of human-AI interaction is the ability to detect patterns or anomalies in medical images that might be missed by either humans or AI alone. This can lead to earlier and more accurate diagnosis, ultimately leading to better patient outcomes.

Balancing software and human supervision is particularly important in the context of medical images. It provides greater accuracy in data analysis as a result of both AI algorithms and manual checks being used to ensure reliability and transparency. It helps foster trustworthiness among users by allowing them to know that their data is secure and free from bias or manipulation and it allows for better understanding of complex systems, since the combination of human insight and AI algorithms can provide deeper insights into disease progression and effective treatments.

While our system is able to detect patterns and quantify biomarkers more quickly and accurately than manual methods, it is important to keep the clinician as part of the equation.

Replacing a clinician with a software in the context of medical image analysis would not be a good idea for several reasons.

First, medical image analysis is a complex task that requires not only technical expertise, but also medical knowledge and clinical judgment. A software, no matter how advanced, cannot fully replicate the expertise and intuition that a clinician brings to the diagnostic process.

Second, medical images are often used to make critical decisions regarding patient care, and a software, no matter how accurate, can never replace the human touch that a clinician provides.

Third, the use of software alone can lead to over-reliance on the technology and the potential for missed or incorrect diagnoses. This can ultimately lead to negative consequences for patients and the healthcare system.

Lastly, the use of AI-based software in medical imaging is still in the early stages and it is not yet clear how well these systems will perform in practice, and how well they will be able to provide accurate results, especially in cases where the images are not clear or have abnormalities that are not common.

In conclusion, while AI-based software can be a valuable tool in the diagnostic process, it should be used in conjunction with, not in place of, a qualified clinician. This can lead to the best outcomes for patients.

We want to provide clinicians with a tool that can help them make better and faster decisions. With this in mind, our nuclear imaging quantification system is designed to be easy to use with minimal training required. Furthermore, it is constantly updated to ensure reliability and accuracy, while also incorporating the latest advancements in AI technology.