Is Artificial Intelligence the Most Reliable Way to Predict Mortality After Liver Transplantation?

Marcos Bruna Esteban; Eva Montalvá; Antonio J. Serrano-López; Joan Vila-Francés; Javier Maupoey; Juan Vila J

Research Article

Is Artificial Intelligence the Most Reliable Way to Predict Mortality After Liver Transplantation?

Marcos Bruna Esteban^1*, Eva Montalvá², Antonio J. Serrano-López³, Joan Vila-Francés³, Javier Maupoey² and Juan Vila J²
¹Department of General and Digestive Surgery, General University Hospital of Valencia, Spain
²University and Polytechnic Hospital of La Fe, Spain
³Department of Data Analysis Laboratory, School of Engineering at the University of Valencia, Spain

^*Corresponding author: Marcos Bruna Esteban, Department of General and Digestive Surgery, General University Hospital of Valencia, Spain

Published: 20 Sep, 2018
Cite this article as: Esteban MB, Montalvá E, Serrano- López AJ, Vila-Francés J, Maupoey Jm Juan Vila J. Is Artificial Intelligence the Most Reliable Way to Predict Mortality After Liver Transplantation?. Clin Surg. 2018; 3: 2122.

Abstract

Introduction: Graft allocation in Liver Transplantation (LT) should be based on the greatest survival benefit to the patients awaiting transplantation. This study developed a predictive model to determine recipient mortality 1 year after LT.
Materials and Methods: We developed Artificial Neural Network (ANN) and Logistic Regression (LR) models and compared their results with the Balance of Risk (BAR), Survival Outcomes Following Transplantation (SOFT), and Model for End-stage Liver Disease (MELD) scores. The Development Group used to create the predictive models included 1235 valid cases, while 200 consecutive transplant patients since January 2009 were included in the Generalization Group for internal validation.
Results: The area under the curve (AUC) of the ANN model (0.82) was higher than that of the LR model (0.68). For the Generalization Group, the MELD, SOFT, and BAR scores had AUCs of 0.56, 0.57, and 0.62, respectively. The ANN model had a significantly higher AUC than that of each score (MELD, p=0.005; SOFT, p=0.009; BAR, p=0.02).
Conclusion: ANN model was superior to the LR model and other scales currently used to predict mortality during the first year after LT and to match a particular graft with potential recipients.
Keywords: Artificial Intelligence; Liver Transplantation; Recipient Survival; Waitlist Management

Abbreviations

ANN: Artificial Neural Network; AUC: Area Under the Curve; BAR: Balance of Risk; DRI: Donor Risk Index; HBV: Hepatitis B Virus; HCV: Hepatitis C Virus; IQR: Interquartile Range; LR: Logistic Regression; LT: Liver Transplantation; MELD: Model for End-Stage Liver Disease; NPV: Negative Predictive Value; PPV: Positive Predictive Value; SOFT: Survival Outcomes Following Transplantation

Introduction

Liver Transplantation (LT) has improved markedly and recipient survival has increased progressively [1], with survival rates close to 90% 1 year after transplantation [2]. Although Spain has the highest organ donation rates in the world [3], the supply remains insufficient due to the growing demand and increased number of patients on the waiting list. Therefore, it is necessary to construct predictive mortality and allocation models to increase the efficiency of this process. Data that allow us to predict the expected survival with a specific graft is a very important step in selecting the best recipient in each case [4]. Although many risk scales and predictive models based on a large number of cases and years of experience and follow-up have demonstrated relationships between different variables and liver recipient mortality, none of them constitute a universal model able to predict the result. However, there is evidence that the addition of many risk factors increases the possibility of a worse post-transplant outcome [5]. Therefore, this study developed a predictive model to determine recipient mortality 1 year after LT, based on known preoperative donor, recipient, and graft variables. For this purpose, we developed Artificial Neural Network (ANN) And Logistic Regression (LR) models and compared their results with other predictive scores.

Figure 1

Figure 1
Flowchart of included cases in the Development group.

Materials and Methods

This was a retrospective, descriptive, analytical study of data collected prospectively from a cohort of 1235 adult hepatic recipients followed for the first year after transplantation. Using defined recipient, donor, and graft variables known before transplantation, this cohort was used to develop two predictive models of recipient [5] mortality 1 year after LT: one based on LR and the other based on ANN. The study was approved by the local research ethics committee. To create predictive models, we included all patients undergoing orthotopic LT in the Hepatobilio pancreatic Surgery and Transplantation Unit of the Hospital Universitario y Politécnico La Fe (Valencia-Spain), from November 1994 to December 2008 in the Development Group. We excluded patients younger than 14 years and recipients of a partial liver graft or a graft from a non-heartbeating donor, domino transplantation, combined transplantation, or retransplantation, and recipients followed for less than 1 year (Figure 1). We applied these predictive models to a population of 200 consecutive patients transplanted in the same unit since January 2009 (Generalization Group), using the same inclusion and exclusion criteria. We considered recipient mortality related to the procedure in the following situations: surgical complications, graft dysfunction, recurrence of primary disease or complications due to immunosuppressive therapy.
We used data for 29 variables to create the predictive models (Table 1). A univariate, descriptive, comparative analysis was used to compare the Development and Generalization Groups, using the t-test for independent samples for continuous variables and the chisquare test for qualitative variables. The statistical analysis and LR model were performed using SPSS^® ver.20.0 for Windows, including all variables that were selected in the study and using a sequential withdrawal procedure based on the likelihood ratio. MATLAB^® ver.2010 was used to develop the predictive model based on the ANN, consisting of a multilayer perceptron with forward connections and supervised learning. To maximize the information [6] available to develop and compare predictive models, missing values in the Development Group were completed using the hot-deck 6 pairing method, filling incomplete cases using the values for the most similar cases in our database. The predictive capacity of the model was assessed using the area under the receiver operating characteristics curve (AUC). Finally, the probability thresholds of the predictive models were determined by a multidisciplinary team to maximize their value, to improve the allocation efficiency and transplantation results. For this purpose, utility values were assigned for each test result (Table 2), establishing the maximum value for the prediction of recipient mortality. All other values were agreed on in relation to the maximum.
Once the LR and ANN models were created from the development group, we applied them to the Generalization Group and analyzed their predictive capacity. The results were evaluated by a comparative analysis of AUC using the Hanley and McNeil test [7]. Similarly, the Model for End-stage Liver Disease (MELD), Survival Outcomes Following Transplantation (SOFT), and Balance of Risk (BAR) [8,9] scores were applied to the Generalization Group and we compared them with the created predictive models.

Figure 2

Figure 2
Comparison of AUROCs for ANN vs. LR models in the Development group (a) and in the Generalization group (b). AUROCs for ANN and LR models and BAR, SOFT and MELD scores in the Generalization group (c).

Figure 3

Figure 3
Importance of the variables and AUC of ANN model when we remove each of them from the model.

Table 1

	DEVELOPMENT GROUP *Median (IQR)* *Frequency (%)*	GENERALIZATION GROUP *Median (IQR)* *Frequency (%)*	p
RECIPIENT VARIABLES
Age (years)	55 (47-60)	54 (48-60)	0.67
BMI (Kg/m2)	26.2 (23.6-29.3)	27.4 (24.7-30)	0.34
Bilirubin blood level (mg/dl)	2.7 (1.4-4.7)	3.1 (1.6-5.8)	0.21
Proteins blood level (g/dl)	7.1 (6.4-7.7)	6.9 (6.1-7.5)	0.35
Albumin blood level (g/dl)	3.2 (2.8-3.7)	3.1 (2.7-3.7)	0.73
Creatinine blood level (mg/dl)	0.9 (0.7-1.1)	0.9 (0.7-1.1)	0.83
Quick index (%)	62 (48-76)	58 (48-74)	0.56
Cardiovascular risk -No -1-2 factors -More than 2 factors	464 (37.8) 740 (60.3) 24 (2) Total = 1228	91 (45.5) 103 (51.5) 6 (3) Total = 200 (100)	0.09
Nephropathy -Yes -No	102 (8.3) 1120 (91.7) Total = 1222 (100)	25 (12,5) 175 (87.5) Total = 200 (100)	0.15
Use of diuretics -Yes -No	764 (61.9) 460 (38.1) Total = 1224 (100)	136 (68) 64 (32) Total = 200 (100)	0.21
Child -A -B -C	232 (18.8) 459 (37.2) 544 (44) Total = 1235 (100)	37 (18.5) 61 (30.5) 102 (31) Total = 200 (100)	0.13
Portal thrombosis -Portal thrombosis < 50% -Portal thrombosis > 50% -Total portal thrombosis + parcial SMV -Total thrombosis of porta and SMV	1053 (85.3%) 116 (9.4%) 61 (4.9%) 5 (0.4%) Total = 1235 (100%)	185 (92.5) 8 (4) 6 (3) 1 (0,5) Total = 200 (100)	0.09
Etiologic diagnosis -HCV cirrhosis -HBV cirrhosis	624 (50.5) 113 (9.1)	92 (46) 14 (7)	0.22
Enolic cirrhosis Cholestasis cirrhosis Toxic and medication cirrhosis Other cirrhosis Other diseases	313 (25.3) 45 (3.6) 13 (1.1) 88 (7.1) 39 (3.2) Total = 1235 (100)	61 (30.5) 8 (4) 7 (3.5) 12 (6) 6 (3) Total = 200 (100)
Hepatocellular carcinoma (HCC) No HCC HCC and Milán + HCC and extraMilán	835 (67.6) 339 (27.4) 61 (4.9) Total = 1235 (100)	130 (65) 65 (32.5) 5 (2.5) Total = 200 (100)	0.16
Surgery (elective/urgent) Urgent Elective	33 (2.7) 1202 (97.3) Total = 1235 (100)	9 (4.5) 191 (95.5) Total = 200 (100)	0.33
DONOR VARIABLES
Age (years) BMI (Kg/m2) Bilirubin blood level (mg/dl) Sodium blood level (mEq/l) ALT blood level (U/l) Length of stay in ICU (days)	51 (34-63) 25.3 (23.4-27.6) 0.7 (0.5-0.9) 147 (141-155) 25 (16-46) 2 (1-4)	62 (49-71) 26.8 (24.7-29.4) 0.6 (0.4-0.9) 148 (142-156) 24 (16-36.7) 2 (1-4)	<0.001 0.43 0.78 0.65 0.32 0.89
Cardiovascular risk No 1-2 factors More than 2 factors	464 (37.6) 705 (57.1) 47 (3.8) Total = 1216 (100)	50 (25) 82 (41) 68 (34) Total = 200 (100)	<0.001
Hemodynamic instability No Low blood pressure Low blood pressure + CRA	741 (60.5) 388 (31.7) 96 (7.8) Total = 1225 (100)	107 (53.5) 69 (34.5) 24 (12) Total = 200 (100)	0.08
Use de vasopressors Yes No	731 (59.2) 493 (40.8) Total = 1224 (100)	50 (25) 150 (75) Total = 200	<0.001
Cause of death CVI CET traffic accident CET other Anoxia Others	765 (61.9) 291 (23.6) 99 (8) 57 (4.6) 23 (1.9) Total = 1235 (100)	152 (76) 12 (6) 22 (11) 12 (6) 2 (1) Total = 200 (100)	<0.001
Macroscopic steatosis No steatosis Low steatosis (less than 15%)	962 (82.7) 150 (12.9)	134 (67) 36 (18)	0.01
Moderate steatosis (15-30%) High steatosis (more than 30%)	43 (3.7) 8 (0.7) Total = 1163 (100)	25 (12.5) 5 (2.5) Total = 200 (100)
Atherosclerosis No or low Moderate or high	729 (62.5) 437 (37.5) Total = 1166 (100)	133 (66.5) 67 (33.5) Total = 200 (100)	0.45
COMPATIBLITY VARIABLES
Blood compatibility Same group Compatible	1211 (98.1) 24 (1.9) Total = 1235 (100)	198 (99) 2 (1) Total = 200 (100)	0.87
Gender Recipient Male – Donor Male Recipient Male – Donor Female Recipient Female – Donor Male Recipient Female – Donor Female	560 (45.3) 307 (24.9) 185 (15) 183 (14.8) Total = 1235 (100)	91 (45.5) 64 (32) 22 (11) 23 (11.5) Total = 200 (100)	0.34

Table 1: Descriptive and comparative analysis of the variables in the Development and Generalization groups.
IQR: Interquartile Range; BMI: Body Mass Index; SMV: Superior Mesenteric Vein; HCV: Hepatitis C Virus; HBV: Hepatitis B Virus; ALT: Alanine Aminotransferase;
ICU: Intensive Care Unit; CRA: Cardiorespiratory Arrest; CVI: Cerebrovascular Injury; CET: Cranioencephalic Trauma.
Cardiovascular risk factors: HTA, diabetes, smoke, cardiopathy and cerebrovascular accident.
Nephropathy: previous kidney disease, acute kidney failure or creatinine blood level higher than 1,5 mg/dl.
Low blood pressure: sistolyc blood pressure lower than 90 mmHg during 30 minutes.
Atherosclerosis: macroscopic evaluation of the aortic wall, performed by surgeon during extraction time.

Table 1
Descriptive and comparative analysis of the variables in the Development and Generalization groups.

Table 2

Table 2
Utility values.

Table 3

Table 3
Causes of recipient mortality related to the procedure during first year after LT in the Development and Generalization groups.

Results

From November 1994 to December 2008, 1435 transplants were performed at our center. After applying the exclusion criteria, 1235 valid cases were included in the Development Group (Figure 1). Since January 2009, 200 cases were included in the Generalization Group [7]. During the first year after transplantation, 12.1% of the recipients (150 patients) in the development group died, with a median survival of 11 (IQR 10-12) months. In the Generalization Group, 19 patients (9.5%) died, with a median survival of 11 (IQR 10-12) months. The main causes of mortality were infections and recurrence of the primary liver disease and the incidence of each were similar in both groups (Table 3). Univariate analysis Table 4 showed that variables significantly (p<0.05) related to recipient mortality during the year after transplantation were donor age, recipient age, etiological diagnosis of cirrhosis, and presence of nephropathy, hepatocellular carcinoma and portal thrombosis in the recipient. The LR model was obtained after 22 steps (Additional Information 1). AUC of the LR model was 0.72 (95% CI 0.68–0.76) in the Development Group (Figure 2) and an AUC of 0.68 (95% CI 0.54–0.82) was obtained when we applied this model in the Generalization Group (Figure 2). For the assigned values of utility, the sensitivity, specificity, and accuracy were 42.1, 84.5, and 80.5% respectively, with a positive predictive value (PPV) of 22.2% and a Negative Predictive Value (NPV) of 93.2% (Additional Information 2). The ANN model was based on a fully connected neural network with 42 input neurons, 27 hidden neurons, and one output neuron, which gives the risk of the recipient’s mortality during the first year after LT. The analyzed variables had different levels of importance in the model and Figure 3 shows the AUCs of the model after removing each of them. Based on previously defined utility values, a threshold of 0.28 (0.2 for the LR model) was obtained to discriminate between positive and negative tests. We applied the model created in the Development Group and obtained an AUC of 0.81 (95% CI 0.77-80.85) (Figure 2). In the Generalization Group (Figure 2), we obtained an AUC of 0.82 (95% CI 0.68-0.96) and the sensitivity, specificity, and accuracy were 68.4, 86.1, and 84.5%, respectively, with a PPV of 34.2% and NPV of 96.3% (Additional Information 3). When we compared the AUCs of the predictive models in both groups, the ANN model was superior to the LR model, with a statistically significant difference (p<0.001) in the Development Group (Figure 3).
The MELD, SOFT, and BAR scores were applied in the Generalization Group, obtaining AUCs of 0.56 (95% CI 0.41-0.71), 0.57 (95% CI 0.42-0.71), and 0.62 (95% CI 0.48–0.75), respectively. The AUC of the ANN model was significantly higher than each of these: MELD, p=0.005; SOFT, p=0.009; BAR, p=0.02 (Figure 2).

Table 4

Table 4
Univariate analysis.

Discussion

Artificial intelligence and models based on an ANN can give predictive results using large databases, such as a liver transplantation series. These models are based on the structure of the brain and they can detect complex and, in many cases, non-linear relationships among variables due to their plasticity. The use of functional and hepatic status assessment scores such as MELD, SOFT, and BAR has helped to optimize waiting list prioritization and to decrease the short-and long-term mortality after LT. Nevertheless, about 10% of liver recipients die within 1 year of the intervention. Therefore, it is necessary to continue searching for a method to make graft allocation to a specific recipient as efficient as possible [10,11]. In this study, ANN methodology was applied to make a model that proved to be superior at [9] predicting recipient mortality during the first year after transplantation in comparison with a model created with LR and other classic scores. The ANN model included 29 variables related to recipient mortality after transplantation. All of them are easily collected during the preoperative period. Only two of them are subjective variables: graft macroscopic steatosis and donor atherosclerosis. However, this subjectivity is reduced because multivisceral extraction is usually performed by surgeons with a lot of experience, and only five surgeons in our group perform the extraction and macroscopic evaluation of the graft and aorta in the donor, using systematically specific protocols and scales validated by a multidisciplinary team.
The characteristics of the liver recipients in the Developmental Group were similar to those published in other studies and in the European Liver Transplantation Registry [12], with a median age of 55 years [13]. More than 60% of them were male and they had some cardiovascular risk factors, as in other series [14]. The main indications for transplantation were cirrhosis due to chronic Hepatitis C Virus (HCV) infection and enolic cirrhosis [15]. Overall, 40% were classified in Child group C, as in other groups [15]. On the other hand, the donor characteristics have changed over time. The donors are now older, with more associated diseases and more frequent deaths due to cerebrovascular injury [16]. Despite the fact that the data used in these models were collected over a long period of time, future advances in treatments and changes in the indications and characteristics of these patients could alter the utility of these predictive models. Our recipient survival is close to 90% 1 year after transplantation, which is similar to published values [17]. Despite the fact that the quality of implanted grafts has [10] been worsening, the range of recipient survival has been maintained over time. The variable with the greater prognostic relevance was the recipient’s liver disease. The ages of the donor and recipient18and the causes of the donors’ death were also important, as other studies have shown [19].
The predictive results of these models are clinically correct. However, their sensitivity is low compared with their high specificity (close to 90%) due to the assigned utility values, supported by the main objective of avoiding recipient deaths. Similarly, the positive predictive values are low in both models and the negative predictive values exceed 90%, to create a test that does not fail to predict recipient mortality. Consequently, when the test is positive, the best option is not to perform the transplantation in that recipient, and to assign the graft to another recipient with better expectations. We noted the superior predictive capacity of the ANN model versus the LR model and the currently used BAR, SOFT, or MELD scores. This better performance of ANN could be justified by the characteristics of the clinical data used and the limitations present in the LR and other models, which are based on a small number of variables, to simplify their applicability. In this way, an ANN can establish more complex relationships among variables and obtain more reliable results with large databases than models created with LR [20]. An ANN establishes interconnections among input variables that give a result that we can use as a risk predictor after the training and learning process.
A few publications have used this mathematical model in LT. Briceño et al. [21] published a multicenter study with more than 1000 transplantation based on this methodology. A total of 57 variables for each donor-recipient pair were used to create 211 predict models than predicted probability of graft survival (AUC: 0.80) and graft loss (AUC: 0.82) 3 months after transplantation. The end point of our model is different: recipient survival 1 year after transplantation, but we only obtain a dichotomous result: dead of alive, with a similar accuracy than Briceño´s model. Their model was also superior to the MELD, D-MELD, P-SOFT, BAR, and DRI 21 scales. Ibañez et al. [22] developed a LR and ANN to predict transplant failure 3 months after transplantation. They selected 19 variables from donor, recipient and operative data and they obtained an AUC in the validation cohort of 0.81 with ANN model, similar to our results. Our predict model based on ANN can maintain accuracy of these models 1 year after transplantation. To facilitate the use of our predictive model, we have developed an application available at http://emac.uv.es/liver (Figure 4). After introducing all variables, a predictive result is obtained. This result could facilitate decision-making to allocate available grafts based on the expected mortality for each particular recipient.
The model based on the ANN was superior to the LR model and other scales currently used to predict mortality during the first year after LT and to match a particular donor with a potential recipient. However, multicenter validation of this model is necessary before it can be considered a valid tool for improving the efficiency of the management of LT wait lists [12].

Figure 4

Figure 4
Web application (http://emac.uv.es/liver/index_eng.php).

Prediction	Reality		Value	Justification
TRUE POSITIVE	Death	Death	+1	This is the main purpose of the study: to detect the probability of recipient mortality 1 year after LT.
TRUE NEGATIVE	Alive	Alive	+0.2	It is not important from clinical and economical point of view after the transplantation, but it can be very profitable if we consider before the procedure.
FALSE POSITIVE	Death	Alive	-0.2	It does not involve expenses or special treatments nor recipient´s death. It is a mistake, but its direct consequences are not serious.
FALSE NEGATIVE	Alive	Death	-0.8	It is a very serious error. It should be avoided because its associated costs are very high (years of life lost, retransplantation,....).

	Development Group		Generalization Group
Cause of death	n	%	n	%
Immunosuppression -Sepsis -Neoplasm -Other	90 80 8 2	7.3	9 8 - 1	4.5
Recurrence of disease -Re-VCC -Re-HCC -Primary graft failure -Acute rejection -Other	49 25 12 8 2 2	4	10 6 2 2	5
Surgery -Hemorrhage -Hepatic artery thrombosis -Other	11 6 3 2	0.8
Total	150	12.1	19	9.5

Recipient variables	p
Age	<0.001
BMI	0.98
Cardiovascular risk	0.11
Nephropathy	<0.001
Use of diuretics	0.46
Child	0.75
Urgent	0.81
Etiologic diagnosis	<0.001
Hepatocellular carcinoma	0.05
Blood bilirubin level	0.23
Blood proteins level	0.18
Blood albumin level	0.65
Blood creatinine level	0.35
Quick index	0.30
Portal thrombosis	0.03
Donor variables
Age	<0.001
BMI	0.09
Cardiovascular risk	0.19
Cause of death	0.40
Length of stay in ICU	0.64
Hemodynamic instability	0.06
Use of vasopressors	0.21
Blood sodium level	0.72
Blood bilirubin level	0.14
Blood ALT level	0.93
Atherosclerosis	0.06
Macroscopic steatosis	0.34
Compatibility recipient -donor
Gender	0.03
Blood compatibility	0.37

Research Article

Is Artificial Intelligence the Most Reliable Way to Predict Mortality After Liver Transplantation?

Abstract

Abbreviations

Introduction

Figure 1

Materials and Methods

Figure 2

Figure 3

Table 1

Table 2

Table 2: Utility values.

Table 3

Table 3: Causes of recipient mortality related to the procedure during first year after LT in the Development and Generalization groups.VCC: Virus C Cirrhosis; HCC: Hepatocellular Carcinoma

Results

Table 4

Table 4: Univariate analysis.

Discussion

Figure 4

References

Table 3: Causes of recipient mortality related to the procedure during first year after LT in the Development and Generalization groups.
VCC: Virus C Cirrhosis; HCC: Hepatocellular Carcinoma