Subjects: Mashhad city residents who are between 35 and 65years old. Information was collected by the stratified-cluster sampling method, and registering more than 7603 people randomly from Mashhad city. From them 682 were patients with cardiovascular disease.
3.1. Sampling Method
This study was based on a cohort survey approved by Mashhad University of Medical Science with the code of 85134 (Mashhad Study). This study was conducted on 7600 people who were selected randomly from Mashhad urban population and gathered with stratified-cluster sampling method. This information was gathered by going to the door of people whom were studied. These people were justified about the plan by the statistical agent of this study face to face and in the case of desiring to participate in this plan, a written consent was obtained. In this stage, the family list including residence address, cluster number, telephone number, age, and the number of people living in that house were recorded, and then a complete map of selected cluster was provided. Then, an invitation for entering the study was submitted to them and the time for visiting and filling questionnaire, and implementing required examination and experiments was specified.
3.2. Sample Size
Mashhad Study samples who were approximately 7600 people participated in this study. In first observations we saw that 682 people had cardiovascular disease which was enough for achieving research goals. Maximally two persons with opposed sex from each family whose age where more than 35 and lower than 65 were recruited from the study population. If the number of one gender was more than one, only one of them was assigned for the study randomly.
3.3. Plan Methodology
We identified important risk factors by reviewing the available literature. Then we studied Logit and Probit models and the estimating way of their parameters with different methods. We criticized and investigated these two methods theoretically, and we had in mind other methods that we could use for analyzing the data. Then we investigated Mashhad Study data and analyzed required variables with R software. Before estimation of the model, it was necessary to examine the linear independent variables. To check the linearity, we used two indices; tolerance is an indicator which shows how much the variability of the independent variable that we mean explained by the other independent variables in the model (
4). In this study we investigated both quantitative and qualitative variables. Age, height, weight, BMI, Waist, systolic blood pressure, diastolic blood pressure, hdl-c, ldl-c, TG, total cholesterol (TC), hs-CRP were quantitative variables. Sex, occupation, occupation, Marital status, smoking, history disease were qualitative variables. 3.4. Sensitivity and Specificity
To evaluate and compare the variables that have been categorized in two states such as heart disease variable (being sick or not being sick), we used two indices, sensitivity and specificity. When we can divide data to positive and negative groups, accuracy of the results can be explained by using sensitivity and specificity indices. Sensitivity means the probability of a positive test among those who have the disease. Specificity means the probability of a negative test among those who do not have the disease.
True positive: the patient is correctly diagnosed.
False positive: the healthy person wrongly diagnosed as a patient.
True negative: the healthy person correctly diagnosed.
False negative: the patient wrongly diagnosed as healthy.
Sensitivity and specificity in a test depend on its nature and the test sample. However, the result of a test cannot be interpreted just with the sensitivity and specificity (
5). For example, if the result of a blood test becomes positive and the test has 90% sensitivity and 96% specify, physician is not able to determine to what extend the patient is truly infected. For this purpose, we should use positive predictive value (or NPV if the test result is negative). Predictive value of a test depends on the prevalence of tested phenomena in the statistical population rather than the nature of the test and examples. Low values of the index (less than 0.1) for each variable represent multiple linearity with some other variables. Index variance inflation (VIF) is reverse to the tolerance ( 6). VIF values more than 10 indicate the linearity. The results of linearity are shown in Table 1. According to the Table 1, variables such as height, weight and body mass index had linearity problem so to solve this problem, it was necessary to remove one of these three variables. We examined the linearity among variables again with removing the weight variable.
Table 1. Logit Model Results
Coefficients Odds Ratio Confidence Interval of the Odds Ratio Lower Bound Upper Bound Intercept 0.00 Gender, male 4.23 3.99 4.52 Age 2.70 2.10 3.10 High-density lipoprotein 0.98 0.97 0.99 Systolic blood pressure 3.80 3.20 4.12 BMI 3.42 3.00 3.62 Smoking 2.01 1.83 2.23 Cholesterol 1.50 1.00 1.61 Reactive protein hs-CRP 1.70 1.10 2.10 Disease 7.21 7.00 7.60 3.5. Experimental Results
According to the
Table 1, it can be seen that the linearity problem was solved by the removal of weight variable. Afterward sensitivity and specificity were %99 and %98 respectively. According to the Table 2 and the area under the ROC curve (0.745) in the Figure 1, provided classification scheme is accepted.
Table 2. Results of Multicollinearity
Variables Tolerance VIF a Gender 0.326 3.063 Age 0.665 1.504 Education 0.823 1.215 Job 0.739 1.354 Marital status 0.929 1.076 Stature 0.026 38.482 Weight 0.011 91.615 Waist 0.362 2.759 low-density lipoprotein 0.169 5.912 Systolic blood pressure 0.363 2.758 diastolic blood pressure 0.390 2.561 BMI 0.012 81.503 Disease 0.802 1.248 Smoking 0.691 1.447 High-density lipoprotein 0.762 1.312 Cholesterol 0.152 6.578 Reactive Protein 0.965 1.036
aAbbreviation: VIF, variance inflation
Figure 1. Array
Table 2 indicates that gender (sex), age, high-density lipoprotein (hdl-c), systolic blood pressure (sys-bp), body mass index (BMI), smoking, cholesterol (TC), hs-CRP have significant impacts on the risk of cardiovascular disease. Education, marital status, past medical history, height, waist circumference, diastolic blood pressure and lipoprotein with low density did not have any significant impact on cardiovascular disease. The hdl-c variable with negative coefficient indicated that increasing high density lipoprotein decreases the risk of cardiovascular disease. But BMI variable with positive coefficient showed that increasing BMI can increase the risk of cardiovascular disease. 3.6. Estimating Logit Model to Identify Cardiovascular Disease Risk Factors in Mashhad City
Logit model was used to examine the impact of age, gender (sex), occupation, education, marital status, height, waist circumference, smoking, history of disease, systolic blood pressure, diastolic blood pressure, body mass index (BMI), hdl-c, ldl-c, and cholesterol (TC) and hs-CRP on cardiovascular disease. We should examine sensitivity and specificity indices and ROC curve to examine the model suitability (
Figure 1, Table 2).
As mentioned earlier, interpretation of the chance ratio is as follows:
A. Values more than one, indicate a greater chance of success than failure.
B. Values less than one, indicate a lower chance of success than failure.
According to Logit model we can conclude that risk of heart disease is 2.7 times higher in older people compared to the younger. However, increasing high density lipoprotein reduces the risk of diabetes
3.7. Probit Model Estimation to Identify Factors for Cardiovascular Disease in Mashhad City
Multivariate Probit model was used to examine, the impact of age, sex, occupation, smoking, history of disease, systolic blood pressure, diastolic blood pressure, BMI, hdl-c, ldl-c, cholesterol (TC) and hs-CRP on developing cardiovascular disease. The results are presented in the
Table 3. The Results of the Probit Model
Coefficients Odds Ratio Confidence Interval of the Odds Ratio Lower Bound Upper Bound Intercep 3.083 3.02 3.12 Gender, male 4.217 4.002 4.307 Age 2.29 2.040 2.34 High-density lipoprotein 0.014 0.009 0.03 Systolic blood pressure 3.2 3.07 3.3 BMI 2.1 2.080 2.21 Smoking 2.015 1.8 2.03 Cholesterol 1.1 1.08 1.2 Reactive protein hs-CRP 1.4 1.08 1.6 Disease 4.034 3.28 5.81
Education, marital status, past medical history, height, waist circumference, diastolic blood pressure and lipoprotein with low density did not have any significant impact on cardiovascular disease. Considering the
Table 3, if age variable increases one unit, the risk of heart disease would increase, too. So we can conclude that the risk of heart disease in people who are older compared to those who are younger, is 2.29 times higher. 3.8. Logit and Probit Comparing Model to Identify Risk Factors for Cardiovascular Disease in Mashhad City
In two previous sections, we examined cardiovascular disease risk factors with Logit and Probit models.
Table 3 indicates the similarity of the both models results. Then, we examined deviance between the two models to compare them. We could see that in Logit model estimated model deviance was equal to D L = 443.4289 and in Probit model estimated model deviance was equal to D P = 523.4293, therefore there was not a significant difference between them. So, none of them have advantage over each other, statistically.