 Research
 Open Access
 Published:
BPLT^{+}: A Bayesianbased personalized recommendation model for health care
BMC Genomics volume 14, Article number: S6 (2013)
Abstract
In this paper, we propose an Advanced Bayesianbased Personalized Laboratory Tests recommendation (BPLT^{+}) model. Given a patient, we estimate whether a new laboratory test should belong to a "taken" or "nottaken" class. We use the bayesian method to build a weighting function for a laboratory test and the given patient. A higher weight represents that the laboratory test has a higher probability of being "taken" by the patient and lower probability of being "nottaken" by the patient. For the sake of effectiveness and robustness, we further integrate several modified smoothing techniques into the model. In order to evaluate BPLT^{+} model objectively, we propose a framework where the data set is randomly split into a training set, a validation input set and a validation label set. A training matrix is generated from the training data set. Then instead of accessing the training data set repeatedly, we utilize this training matrix to predict the laboratory test on the validation input set. Finally, the recommended ranking list is compared with the validation label set using our proposed metric CorrectRate_{ M }. We conduct experiments on real medical data, and the experimental results show the effectiveness of the proposed BPLT^{+} model.
Background
Large amounts of clinic laboratory test data are collected and stored every day. Therefore, there is an increasing need for analyzing and utilizing the laboratory test data. The problem we are working on in this paper is to recommend laboratory tests for given patients. Health care recommendation problems have drawn researchers' attention for years. However, there are not a lot of studies conducted on the clinic laboratory test recommendation problem.
The medical data we are working on contains several years patients' laboratory test records. Figure 1 shows an example of the data format. Formally, the laboratory test prediction problem can be described as follows [1]: "Given a set of patients P = {p_{1}, p_{2}, ..., p_{ n }} and a set of laboratory tests T = {test_{1}, test_{2}, ... test_{ M }}, each patient p_{ j } has done tests test_{j,1}, ..., test_{ j,kj }. If a doctor would like to assign a new test for patient p_{ j }, which test in T should be chosen?"
The computer systems have been playing for an important role in health care for years [2–8]. Statistic algorithms [9–12] lead an important role in investigating health care data. [13, 14] extracts chemical keywords from a query patent by analyzing word frequency and the word's effect over the data collection. Bayesian learning is a widely used algorithm that shows good performance [15–19]. A semanticbased association rule mining approach is proposed to model the medical query contexts in [20]. Using a novel classifier based on the Bayesian discriminant function, Raymer, M. L. [21] present a hybrid algorithm that employs feature selection and extraction to isolate salient features from large medical and other biological data sets. Martín and Pérez [22] analyze the robustness of the optimal action in a Bayesian decision making problem in the context of health care. [23, 24] studies the association between two words by simulating the impact of words in documents in the context of information retrieval. A probabilistic survival model is derived from the survival analysis theory for measuring aspect novelty of genomics data [25]. A mixture markov model is proposed to investigate user navigation patterns so that a personalized recommendation system for each user can be built [26]. In our previous work [1], we propose a laboratory test prediction model, which would objectively determine whether a laboratory test is associated to a patient. This paper is a significant extension to [1].
Smoothing [27] is a technique to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other finescale structures/rapid phenomena. The smoothing techniques have been used in many realms to improve the accuracy [28]. Based on the basic Bayesian algorithm and smoothing techniques, we propose an Advanced Bayesianbased Personalized Laboratory Tests recommendation (BPLT^{+}) model, to investigate the correlation among laboratory tests for each patient. Evaluation is a crucial issue in the health care domain [29]. Some previous health care researchers do evaluation via patient interaction [30] or statistics [31]. We present a metric CorrectRate_{ X } by employing the idea of Mean Average Precision (MAP) [32] in Information Retrieval domain.
Four unique contributions are presented in this paper. Firstly, we learn the associations among laboratory tests and make personalized recommendations to patients without human interaction. Secondly, we integrate modified smoothing technologies to improve the personalized recommendation model and propose the BPLT+ model. Thirdly, we propose a framework to randomly generate a training data set, a validation input set and a validation label set. Fourth, we use a objective evaluation metric for personalized recommendation systems without patient interaction.
Methods
BayesianBased personalized laboratory tests recommendation (BPLT) model
Here we assume that the laboratory tests for a patient have associations among each other. For instance, if a patient is suspected to have diabetes, usually the doctor will assign both Hemoglobin test and Glucose Fasting test for this patient. We can see that there exists an association between Hemoglobin and Glucose Fasting with respect to some hidden information, diabetes in this case. On the other hand, if a patient is assigned Hemoglobin test, then it is very likely that this patient should also take Glucose Fasting test. In this section, we build a model for learning the associations of the laboratory tests, inferring the associations between patients and laboratory tests, and therefore recommending new laboratory tests to the patients. We regard the test recommendation problem as a special classification problem, where a test belongs to either a "taken" or "nottaken" class. We use Bayesian classifier as our basic classifier, and modify it to a personalized ranking model.
Basic concept: Bayesian classifier
A classification problem is the following [33]: given a set of training instances, each described with a set of n attributes and each belonging to exactly one of a certain number of possible classes, learn to classify new, unseen objects. In addition, each attribute has a fixed number of possible values. We use naive Bayesian classifier as our basic classifier in this paper, since it evaluates directly the probability of taking a test and the conditional probability among two tests. Moreover, naive Bayesian is easy to construct and has surprisingly good performance in classification, even though the conditional independence assumption is rarely true in realworld applications [34]. The probability model for a classifier is a conditional model
where F_{1}, ..., F_{ n } are attributes, and C is a class variable. By Bayesian criteria, it equals to
The denominator is effectively constant, and the numerator is equivalent to the joint probability model
In naive Bayesian, it assumes the features are conditional independent
Therefore, the probability of a class C given feature F_{1}, ..., F_{ n } is
where A=\frac{1}{\text{Pr}\left({F}_{1},...,{F}_{n}\right)} is a constant.
The weighting function of BPLT model
In this Section, we describe the Bayesianbased Personalized Laboratory Tests recommendation (BPLT) model, which was proposed in our previous work [1]. More details are given in this paper. The purpose of BPLT model is to classify the laboratory tests for individual patients by their personal conditions. In the real world, it is often easier to obtain the patients' previous laboratory tests information. Therefore, the BPLT model recommends additional new laboratory tests to patients, given the previous laboratory tests that the patients have taken.
Suppose we have a set of M laboratory tests T = {test_{1}, test_{2}, ..., test_{ M } }, and a patient p_{ j } who has taken tests T_{ j } = {test_{ j },_{1}, ..., test_{ j,kj } } where test_{ j,i } ∈ T for all 1 ≤ i ≤ k_{ j }. We denote the events that tests in T_{ j } are taken by p_{ j } as F_{ j },_{1}, F_{ j },_{2}, ...F_{ j,M } . For example, if we have 7 tests in T, and p_{ j } has taken test_{3}, test_{5} and test_{7} could be represented as (F_{ j },_{1}, F_{ j },_{2}, ..., F_{ j },_{7}) = (0, 0, 1, 0, 1, 0, 1). Bayesian Classifier is employed to evaluate the association between p_{ j } a new test test_{0} where test_{0} ∈ T and test_{0} ∉ T_{ j }. We use F_{j,0 }to represent the event of p_{ j } should take t_{0}, and {F}_{j,0}^{c} to represent the event of p_{ j } should not take t_{0}. By Formula (3), the probability of F_{ j },_{0} given F_{ j },_{1}, F_{ j,2 }, ...F_{ j,M } is
The probability of {F}_{j,0}^{c} given F_{ j },_{1}, F_{ j },_{2}, ... F_{ j,M } is
In the BPLT model, we reward the tests with high probability of "taken" and low probability of "nottaken". The correlation between a new test test_{0} and a given patient p_{ j } is shown in Definition 1 [1].
Definition 1 The correlation between a new test test_{0} and a given patient p_{ j } is defined as the log function of the probability of p_{ j } should take test _{0} divided by the probability of p_{ j } should not take test _{0} given F_{j,1 }, F_{ j },_{2}, ... F_{ j,M }.
We can see that higher value of corr(test_{0}, p_{ j }) indicates that test_{0} has more association with p_{ j }. The calculation of corr(test_{0}, p_{ j }) can be further simplified as follows
Moreover, a test either belongs to a "taken" class or a 'not taken" class. Thus, the following two formulas are held.
from which we can obtain \text{Pr}\left({F}_{j,0}^{c}\right) and \text{Pr}\left({F}_{j,i}{F}_{j,0}^{c}\right)
Thus \text{Pr}\left({F}_{j,0}^{c}\right) and \text{Pr}\left({F}_{j,i}{F}_{j,0}^{c}\right) in (5) can be eliminated in corr (test_{0}, p_{ j } ), as shown below
A joint probability for patient p_{ j } take both of the tests test_{ i } and test_{0} is
The definition of the correlation between test_{0} and p_{ j } is
which leads to the following Definition 2 [1].
Definition 2 The weighting function for a laboratory test test _{0} for a patient p _{ j } is the simplified correlation between test _{0} and p _{ j }
where
The new laboratory tests will be ranked in a list according to w(test_{0}, p_{ j } ) for a given patient p_{ j }. In the later section, we will present the evaluation environments for the laboratory test ranking list.
An advanced model: BPLT^{+}
To have a more robust and better performance model, we further propose an advanced model, BPLT^{+}, by improving the BPLT model using several smoothing techniques. There are two reasons for smoothing BPLT. One reason is that smoothing is a way to deal with noise within the data. Another reason is to avoid the mathematically meaningless. When test^{0} laboratory test has not been observed in the previous visits, which means α = 0, the first part of formula (6) will become an irrational number. Meanwhile, when the joint frequency of two laboratory tests is zero, which means β_{ j },_{ i } = 0, the second part of (6) will become an irrational number. Therefore, we introduce smoothing technologies to further improve BPLT model.
Smoothing techniques
In statistics, smoothing [27] is a technique to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other finescale structures/rapid phenomena. The main purpose of smoothing in this paper is to assign a nonzero probability to the unseen tests and improve the accuracy of test probability estimation in general.
The smoothing techniques are discussed based on the following definitions of a conditional probability [28].
where c(t;p) is the count of a patient taking a test. Here are some commonly used smoothing methods. Since we have defined a ranking problem, which is similar to the problems in Information Retrieval (IR), we use some widely used smoothing methods in language model in IR. The general form of a smoothed model [35] is assumed to be the following:
where Pr_{ t }(tp) is the smoothed probability of a test t given the patient with existing tests. Pr(tC) is the probability of a test t given the whole data set.
A smoothing method may be as simple as adding an extra count to every test, which is called additive or Laplace smoothing, or more sophisticated as in Katz smoothing, where tests of different count are treated differently. Three representative methods that are popular and effective are:

The JelinekMercer method
\underset{\lambda}{\text{Pr}}\left(tp\right)=\left(1\lambda \right)\text{Pr}\left(tp\right)+\lambda \text{Pr}\left(tC\right)(9)
where λ is a balancing parameter ranges from 0 to 1.

Bayesian Smoothing using Dirichlet Priors
\underset{{\mu}_{0}}{\text{Pr}}\left(tp\right)=\frac{c\left(t;p\right)+{\mu}_{0}\text{Pr}\left(tC\right)}{{\sum}_{t\in T}c\left(t;p\right)+{\mu}_{0}}(10)
where µ_{0} is a balancing parameter, and µ_{0} >0. The Laplace method is a special case of this technique.

Absolute Discounting
\underset{\delta}{\text{Pr}}\left(tp\right)=\frac{max\phantom{\rule{0.3em}{0ex}}\left(c\left(t;p\right)\delta ,0\right)}{{\sum}_{t\in T}c\left(t;p\right)}+\sigma p\left(tC\right)(11)
where δ ∈ [0, 1] is a discount constant and σ = δp_{ u }/p, so that all probabilities sum to one. Here p_{ u } is the number of unique terms in document d, and p is the total count of words in the documents.
BPLT^{+}with smoothing techniques
There are two parts in formula (6) that need smoothing. The first one is the conditional probability β_{ j },_{ i } = Pr(F_{ j,i }F_{j,0}). Its smoothed format is as follows:

BPLT^{+} with JelinekMercer
{\beta}_{j,i}^{\lambda}=\left(1\lambda \right){\beta}_{j,i}+\lambda {\gamma}_{j,i}(12) 
BPLT^{+} with dirichlet priors
{\beta}_{j,i}^{\mu}=\frac{{\beta}_{j,i}+\mu {\gamma}_{j,i}}{1+\mu}(13) 
BPLT^{+} with absolute discounting
{\beta}_{j,i}^{\delta}=\frac{max\left(c\left(t;p\right)\delta ,0\right)}{{\sum}_{t\in T}c\left(t;p\right)}+\delta {\gamma}_{j,i}(14)
In JelinekMercer BPLT^{+} and Absolute Discounting BPLT^{+}, we use the existing smoothing method. The smoothing parameters λ, δ are within the range of [0, 1]. In Dirichlet Priors BPLT^{+}, we modify the Dirichlet smoothing technique, by divide both the numerator and the denominator in (10) by {\sum}_{t\in T}c\left(t;p\right), and normalize the parameter µ to the range of 0[1], where \mu =\frac{{\mu}_{0}}{{\sum}_{t\in T}c\left(t;p\right)}.
Another part in formula (6) needs smoothing is \text{log}\frac{\alpha}{1\alpha}, which is a simple division that could be smoothed
via Laplace smoothing as
where θ is a tuning parameter ranges from 0 to 1.
Evaluation environments
Datasets
The datasets in our experiment are obtained from Alpha Global IT [1, 36]. Alpha Corporate Group provides laboratory, medical clinic, commercial electronic medical record and practice management software. The data set contains 78 monthly patient's laboratory test results. Our experiments use 6 month results, containing 1,048,575 patients' records, as a key study. Thousands of patients' records and more than 400 laboratory tests are included in our experiments. The data format is the same as the example shown in Figure 1. Our data set contains real patients' information, such as health card ID, age, gender, date of visit, laboratory test ID, laboratory test results. We only use the patient ID and laboratory ID attributes in this paper, and analyze the associations among these laboratory tests. In our future work, we will incorporate more attributes in the laboratory recommendation model.
Validation data and measure
To evaluate BPLT^{+} models objectively, we divide the data set into three components: a training set, a validation input set, and a validation label set. The data set is firstly randomly split into a training set and a validation set. In this step, we split based on the patients and do not split the records from a same patient. Then for the validation set, we randomly remove one test t^{*} from each patient p_{ j }, and store the t^{*} in the validation label set. The ranked list returned by BPLT^{+} will be compared with t^{*} for each patient. To measure such comparison and finally evaluate the effectiveness of BPLT^{+}, we use the following defined CorrectRate_{ X } [1]. Suppose the returned laboratory ranking list is L={t}_{1,j}^{\prime},\dots {t}_{l,j}^{\prime}, CorrectRate_{ X } validates whether t^{*} appears in the top ranked tests. The measure is modified from Mean Average Precision (MAP) [32] evaluation metric.
Definition 3 The CorrectRate_{ X } evaluates the accuracy of a laboratory tests prediction system. It is the number of patients with the desired (golden standard) test matching one of the top X tests generated by the system, divided by the total number of the patients.
where
n is the number of patients, X is a parameter indicating how many top tests are compared to the golden standard test t*, which is set to be 1 or 3 in this paper.
We present an example to show how the CorrectRate_{ X } evaluates the model in Table 1. Suppose the laboratory test sets includes 200 tests and there are 5 patients in the validation set. As we have introduced, the BPLT^{+} model returns a ranked list for each patient. Here ">" represents that the weight of the leftside laboratory test is higher than the weight of the rightside laboratory test. In our example, 2 out of 5 patients have the desired test t^{*} ranked in the top 1 position of the list, then CorrectRate_{1} equals 0.4. And 4 out of 5 patients have t^{*} appears within the top 3 positions of the returned ranking list, then CorrectRate_{3} equals 0.8. We can see that the top 3 positions include the top 1 position, so the following statement is always true: CorrectRate_{1} ≤ CorrectRate_{3}.
BPLT^{+}System Framework
The framework of BPLT^{+} Model is shown in Figure 2. The data set in this framework is abstracted to contain only patient ID and laboratory test ID. The procedures in the proposed framework are described as follows.
• Split: First the data set is randomly split into a training set and a validation set.
Random Remove a test as label: Since it is hard to objectively evaluate the performance of the BPLT^{+} model, we further randomly remove a test for each visit of the patients from the validation set. These removed tests are regarded as labels of the validation set input. Our ultimate goal is to recommend the missing test for a patient's visit.
• Build training matrix: To avoid duplicate calculating the frequency of a test and the joint frequency between two tests, we build a training matrix out of the training data. This training matrix contains the frequency of cooccurrences of two laboratory tests. For example, if a patient in the training data did test_{1} and test_{2} together, then add 1 to F_{12} and F_{21}. We can see that the training matrix is a symmetric matrix.
• BPLT^{+}model: The correlation of a given test_{0} and a patient is calculated based on formula (6).
• Evaluation via CorrectRate_{ X }: Finally, the evaluation criteria CorrectRate_{ X }evaluates if the model made the correct recommendations.
Results
We first show the overall performance under different trainingvalidation proportion in Table 2[1]. We randomly take 40%, 50% and 60% of the data out of the raw data set as the training data and keep the rest as the validation data. In general, there is higher performance of BPLT^{+} model on a larger training data set. This is because the larger training data set contains more information, and more knowledge can be learned. With the development of computer technology, larger amount of medical data will be available in practice. Therefore, we will use 60% of data as training data in the rest of this paper. As we have discussed before, CorrectRate_{3} is always higher than CorrectRate_{1}. In general, the BPLT^{+} model has promising performance with an accuracy of 0.7074 for CorrectRate_{1} and an accuracy of 0.7840 for CorrectRate_{3}.
Then we investigate how the smoothing parameters affect the effectiveness in detail. We first consider smoothing β_{ j,i } only. There are three smoothing technologies utilized to smooth β_{ j,i }. They are JelinekMercer BPLT^{+}, Dirichlet Priors BPLT^{+} and Absolute Discounting BPLT^{+}, with the corresponding parameters: λ, µ, δ ∈ [0, 1]. We conduct experiments on these three methods individually. The change of CorrectRate_{1} and CorrectRate_{3} with respect to the parameters are shown in Figure 3, Figure 4, and Figure 5. We can see from the figures that the curve of CorrectRate_{1} is always below the curve of CorrectRate_{3}, which is consistent as we have discussed Definition 3. With the increasing of parameters from 0.1 to 1, both CorrectRate_{1} and CorrectRate_{3} become higher at the beginning due to the incorporating of the smoothing portion. After reaching the maximum value, CorrectRate_{1} and CorrectRate_{3} become lower, since the weighing would tend to be more universal when too much smoothing is incorporated. All the smoothing parameters achieve their best performance at the value of 0.2. Comparing among these three methods, JelinekMercer BPLT^{+} obtains the best performance on both CorrectRate_{1} and CorrectRate_{3}, which are 0.5569 and 0.6167. When it comes to the average value, Dirichlet Priors BPLT^{+}'s average performance on CorrectRate_{3} is better than the other two, and JelinekMercer BPLT^{+}'s average performance on CorrectRate_{1} is the best.
We further discuss to smooth the second part of (6), where the Laplace smoothing parameter is θ. As we have discussed before, JelinekMercer BPLT^{+} has the best performance on both CorrectRate_{1} and CorrectRate_{3}. We focus on investigating the sensitivity of θ by fixing JelinekMercer BPLT^{+} with λ = 0.2. The results are shown in Figure 6. We can see that the CorrectRate_{1} increases while θ is increasing, and the CorrectRate_{3} decreases a little and then increases. Both of them reach the maximum and tend to be stable when θ is greater than 0.5.
Conclusions and future work
An Advanced Bayesian based Personalized Laboratory Tests recommendation (BPLT^{+}) model is proposed in this paper. Based on the assumption that hidden association could exist among laboratory tests, we employ a Bayesian approach to build a weighting function for scoring the correlation between a new laboratory test and a patient. To have a more robust and better performance model, we employ several enhanced smoothing technologies into the BPLT^{+} model. The main purpose of smoothing in this paper is to assign a nonzero probability to the unseen laboratory tests and improve the accuracy of test probability estimation. We integrate existing smoothing techniques in the BPLT^{+} model. In particular, we use three techniques, JelinekMercer, Dirichlet Priors and Absolute Discounting approaches, to smooth the conditional probability of observing a patient taking an existing test when a new test test_{0} is given (Formula 1214). Also we use Laplace method to smooth the log function in the BPLT^{+} model (Formula 15). We conducted experiments to discuss the performance of the BPLT^{+} model and the sensitivity of smoothing parameters. We find that BPLT^{+} is able to make accurate recommendations under proper smoothing parameters.
Further, we propose a novel framework for effectively implementing BPLT^{+} model and objectively testing personalized recommendation systems without human interactions, shown in Figure 2. Based on the real patients' laboratory test data, we randomly generate a training data set, a validation input set and a validation label set. A training matrix containing the laboratory test statistics is calculated from the training data set and stored. For new patients (the validation input set), instead of processing the original training set, we utilize this training matrix to predict the laboratory test on the validation input set, and compare the ranking results with the validation label set.
There are a few future directions of this research work. As we can see from the data format in Figure 1, we have not make use of all the attributes. In the future, we would like to conduct a comprehensive investigation for the patients' profiles. For example, we can cluster the patients into groups and investigate the similarities of the patients in the same group. We can also analyze the associations among laboratory test results and therefore further enhance our proposed personalized recommendation model. Moreover, we look forward to testing our proposed models in more real applications.
References
Zhao J, Huang JX, Hu X, Kurian J, Melek W: A Bayesianbased prediction model for personalized medical health care. Bioinformatics and Biomedicine (BIBM). 2012, 14. 10.1109/BIBM.2012.6392623. IEEE International Conference on: 47 October 2012
Bates D, Cohen M, Leape L, Overhage J, Shabot M, Sheridan T: Reducing the frequency of errors in medicine using information technology. Journal of the American Medical Informatics Association. 2001, 8 (4): 299308. 10.1136/jamia.2001.0080299.
Ogiela L, Tadeusiewicz R, Ogiela M: Cognitive techniques in medical information systems. Computers in Biology and Medicine. 2008, 38 (4): 501507. 10.1016/j.compbiomed.2008.01.017.
Shortliffe E, Cimino J: Biomedical informatics: computer applications in health care and biomedicine. 2006, Springer
Melski J, Geer D, Bleich H: Medical information storage and retrieval using preprocessed variables. Computers and Biomedical Research, An International Journal. 1978, 11 (6): 61310.1016/00104809(78)900381.
Thoma G, Suthasinekul S, Walker F, Cookson J, Rashidian M: A prototype system for the electronic storage and retrieval of document images. ACM Transactions on Information Systems. 1985, 3 (3): 279291. 10.1145/4229.4232.
Frick S, Uehlinger D, Zenklusen R: Medical futility: Predicting outcome of intensive care unit patients by nurses and doctorsA prospective comparative study*. Critical Care Medicine. 2003, 31 (2): 456461. 10.1097/01.CCM.0000049945.69373.7C.
Wu W, Bui A, Batalin M, Au L, Binney J, Kaiser W: MEDIC: Medical embedded device for individualized care. Artificial Intelligence in Medicine. 2008, 42 (2): 137152. 10.1016/j.artmed.2007.11.006.
Kajíc V, Esmaeelpour M, Považay B, Marshall D, Rosin P, Drexler W: Automated choroidal segmentation of 1060 nm OCT in healthy and pathologic eyes using a statistical model. Biomedical Optics Express. 2012, 3 (1): 86103. 10.1364/BOE.3.000086.
Kokol P, Pohorec S, Štiglic G, Podgorelec V: Evolutionary design of decision trees for medical application. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2012, 2 (3): 237254. 10.1002/widm.1056.
Pepe M: The statistical evaluation of medical tests for classification and prediction. 2004, Oxford University Press, USA
Rohian H, An A, Zhao J, Huang X: Discovering temporal associations among significant changes in gene expression. Proceedings of IEEE International Conference on Bioinformatics and Biomedicine, IEEE. 2009, 419423.
Lupu M, Huang XJ, Zhu J: TREC Chemical Information Retrieval  An Initial Evaluation Effort for Chemical IR Systems. World Patent Information Journal. 2011, 33 (3): 248256. 10.1016/j.wpi.2011.03.002.
Zhao J, Huang X, Ye Z, Zhu J.: York University at TREC 2009: Chemical Track. Proceedings of the 18th Text REtrieval Conference. 2009
Bernardo J, Smith A: Bayesian theory. Measurement Science and Technology. 2001, 12: 221222.
Chen J, Huang H, Tian F, Tian S: A selective bayes classifier for classifying incomplete data based on gain ratio. KnowledgeBased Systems. 2008, 21 (7): 530534. 10.1016/j.knosys.2008.03.013.
Clèries R, Ribes J, Buxo M, Ameijide A, MarcosGragera R, Galceran J, Martínez J, Yasui Y: Bayesian approach to predicting cancer incidence for an area without cancer registration by using cancer incidence data from nearby areas. Statistics in Medicine. 2012
Huang X, Hu Q: A Bayesian Learning Approach to Promoting Diversity in Ranking for Biomedical Information Retrieval. Proceedings of the 32nd Annual International Conference on Research and Development in Information Retrieval. 2009, 1923.
Liechty J, Liechty M, Muller P: Bayesian correlation estimation. Biometrika. 2004, 91: 110.1093/biomet/91.1.1.
Babashzadeh A, Daoud M, Huang J: Using semanticbased association rule mining for improving clinical text retrieval. Health Information Science. 2013, 186197.
Raymer M, Doom T, Kuhn L, Punch W: Knowledge discovery in medical and biological datasets using a hybrid bayes classifier/evolutionary algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B. 2003, 33 (5): 802813. 10.1109/TSMCB.2003.816922.
Martín J, Pérez C, Muller P: Bayesian robustness for decision making problems: Applications in medical contexts. International Journal of Approximate Reasoning. 2009, 50 (2): 315323. 10.1016/j.ijar.2008.03.017.
Hu Q, Huang X: Passage Extraction and Result Combination for Genomics Information Retrieval. Journal of Intelligent Information Systems. 2010, 34 (3): 249274. 10.1007/s1084400900974.
Zhao J, Huang JX, He B: CRTER: using cross terms to enhance probabilistic information retrieval. Proceedings of the 34th international ACM SIGIR conference, ACM. 2011, 155164.
Yin X, Huang JX, Li Z, Zhou X: A Survival Modeling Approach to Biomedical Search Result Diversification Using Wikipedia. IEEE Transactions on Knowledge and Data Engineering. 2013, 25 (6): 12011212.
Liu Y, Huang JX, An A: Personalized recommendation with adaptive mixture of markov models. Journal of the American Society for Information Science and Technology. 2007, 58 (12): 18511870. 10.1002/asi.20631.
Titterington D: Common structure of smoothing techniques in statistics. International Statistical Review/Revue Internationale de Statistique. 1985, 141170.
Zhai C, Lafferty J: A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems. 2004, 22 (2): 179214. 10.1145/984321.984322.
Kononenko I: Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine. 2001, 23: 89109. 10.1016/S09333657(01)00077X.
Donabedian A: Evaluating the quality of medical care. Milbank Quarterly. 2005, 83 (4): 69110.1111/j.14680009.2005.00397.x.
Cook N: Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clinical chemistry. 2008, 54: 17
Sanderson M: Information retrieval system evaluation: effort, sensitivity, and reliability. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, ACM. 2005, 162169.
Kononenko I: Inductive and bayesian learning in medical diagnosis. Applied Artificial Intelligence. 1993, 7 (4): 317337. 10.1080/08839519308949993.
Zhang H, Su J: Naive bayesian classifiers for ranking. Machine Learning: ECML. 2004, 501512.
Chen S, Goodman J: An empirical study of smoothing techniques for language modeling. Computer Speech and Language. 1999, 13 (4): 359394. 10.1006/csla.1999.0128.
Alpha Global IT: [http://www.alphait.com/]
Acknowledgements
This research is supported in part by the research grant from the Natural Sciences & Engineering Research Council (NSERC) of Canada and the Early Research Award/Premier's Research Excellence Award. The authors thank Dr. Joseph Kurian and Dr. William Melek from Alpha Global IT for their help and providing the data. In particular, we thank anonymous reviewers for their valuable and detailed comments on this paper.
Based on "A Bayesianbased prediction model for personalized medical health care", by Jiashu Zhao, Jimmy Xiangji Huang, Xiaohua Hu, C Joseph Kurian, and William Melek which appeared in Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on. ©2012 IEEE, 579582.
Declarations
The publication costs for this article were funded by the corresponding author.
This article has been published as part of BMC Genomics Volume 14 Supplement S4, 2013: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2012: Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/14/S4.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JZ proposed BPLT^{+} model, carried on the experiments and drafted the manuscript. JXH supervised the project and revised the manuscript. JXH also contributed in the study design and experiments. XH provides useful feedback. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( https://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Zhao, J., Huang, J.X. & Hu, X. BPLT^{+}: A Bayesianbased personalized recommendation model for health care. BMC Genomics 14 (Suppl 4), S6 (2013). https://doi.org/10.1186/1471216414S4S6
Published:
DOI: https://doi.org/10.1186/1471216414S4S6
Keywords
 Smoothing Parameter
 Mean Average Precision
 Bayesian Classifier
 Ranking List
 Smoothing Technique