## Agreement Between Data Sets

It is important to note that in each of the three situations in Table 1, the percentages of success are the same for both examiners, and if both examiners are compared to a usual 2 × 2 test for matched data (McNemar test), there would be no difference in their performance; on the other hand, agreement among observers varies considerably from country to country in all three situations. The fundamental approach is that “convergence” quantifies the concordance between the two examiners for each of the “pairs” of marks and not the similarity of the overall pass percentage between the examiners. A simple way to see the correspondence between the two versions is the use of Spearman rank correlation. Cohens Kappa (κ) calculates inter-observe compliance taking into account the expected agreement as follows: Consider a situation in which we want to evaluate the concordance between hemoglobin measurements (in g / dl) with a hemoglobinometer on the hospital bed and the formal photometric laboratory technique in ten people [Table 3]. The Bland Altman diagram for this data shows the difference between the two methods for each person [Figure 1]. The mean difference between the values is 1.07 g/dl (with a standard deviation of 0.36 g/dL) and the 95% match limits are 0.35 to 1.79. This means that the hemoglobin level measured by a given person`s photometry can vary from 0.35 g/dl greater than 1.79 g/dl measured by photometry (this is the case for 95% of people; for 5% of individuals, variations could be outside these limits). This obviously means that the two techniques cannot be used as substitutes. It is important that there is no single criterion for acceptable compliance limits; This is a clinical decision that depends on the variables to be measured. It is often interesting to know whether measurements made by two (sometimes more than two) different observers or by two different techniques give similar results. This is called concordance or concordance or reproducibility between measurements. Such an analysis considers the pairs of measurements, either categorical or both numerically, each pair having been made on an individual (or a pathology slide or an X-ray). Dispersal plot with correlation between hemoglobin measurements from two methods for the data presented in Table 3 and Figure 1.

The polka dot line is a trend line (line of the smallest squares) through the observed values, and the correlation coefficient is 0.98. However, individual points are far from the perfect match line (continuous black line) This method is used when evaluations of more than two observers are available for binary or ordinal data. Kalantri et al. studied the accuracy and reliability of pallor as a tool for detecting anemia.  They concluded that “clinical evaluation of pallor may exclude severe anemia and decide modestly.” However, the correspondence between observers for pallor detection was very poor (kappa = 0.07 for conjuncted bruises and 0.20 for tongue blues), meaning pallor is an unreliable sign for the diagnosis of anemia…