Advertisement

Validity and reproducibility of ophthalmologist photo grading of diabetic retinopathy and glaucoma

Published:January 13, 2020DOI:https://doi.org/10.1016/j.jcjo.2019.11.006
      Screening for diabetic retinopathy (DR) is an important component of diabetes management to prevent vision loss, yet many diabetic patients do not receive screening eye examinations.
      • Piyasena M.
      • Murthy G.V.S.
      • Yip J.L.Y.
      • et al.
      Systematic review on barriers and enablers for access to diabetic retinopathy screening services in different income settings.
      Tele-ophthalmology programs administering fundus photographs offer a potential solution, and technological advances have made automated photo-interpretation a real possibility.
      • Gupta V.
      • Bansal R.
      • Gupta A.
      • Bhansali A.
      Sensitivity and specificity of nonmydriatic digital imaging in screening diabetic retinopathy in Indian eyes.
      Automated algorithms are typically trained on images whose disease status has been classified by a human, so the validity of human grading is important. And yet, although the variability of ophthalmologist grading for glaucoma has been described previously, limited information exists for DR.
      • Gupta V.
      • Bansal R.
      • Gupta A.
      • Bhansali A.
      Sensitivity and specificity of nonmydriatic digital imaging in screening diabetic retinopathy in Indian eyes.
      • Lichter P.R..
      Variability of expert observers in evaluating the optic disc.
      • Olson J.A.
      • Strachan F.M.
      • Hipwell J.H.
      • et al.
      A comparative evaluation of digital imaging, retinal photography and optometrist examination in screening for diabetic retinopathy.
      This study attempts to fill that gap in order to better inform optimal methods of human grading for use in automated algorithms.
      In this cross-sectional diagnostic accuracy study, a convenience sample of diabetic patients were recruited from the ophthalmology clinic at Chiang Mai University, Thailand. After pupillary dilation, the fundus of each eye was examined by an ophthalmologist, who noted the presence of DR according to the International Clinical Diabetic Retinopathy Severity Scale as well as the vertical cup-to-disk ratio (VCDR).
      • Wilkinson C.P.
      • Ferris 3rd, F.L.
      • Klein R.E.
      • et al.
      Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales.
      Each eye was then photographed (TRC-NW6S, Topcon, Tokyo, Japan; manufacturer's software used to create an 85° montage image from photographs captured at 9 fixed locations using a repositionable internal fixation light). The mosaic image was graded for the presence of DR according to the International Clinical Diabetic Retinopathy Severity Scale and for the VCDR by 5 ophthalmologists who were masked to each other's grades and to all participant identifiers. Although mosaic images are not considered the gold standard for DR or glaucoma screening, they have been shown to have sensitivities equivalent to in-person ophthalmology examination.
      • Shiba T.
      • Yamamoto T.
      • Seki U.
      • et al.
      Screening and follow-up of diabetic retinopathy using a new mosaic 9-field fundus photography system.
      Glaucoma suspect was defined as a VCDR ≥0.6. Cohen's kappa statistic was used to assess agreement between photo grades and clinical examination and also agreement between the 5 photo graders, using bootstrapped 95% confidence intervals (CIs) resampled at the participant level to account for correlation of eyes from the same person (N = 999 replications). Ethics approval was obtained from University of California, San Francisco, and Chiang Mai University.
      In total, 235 eyes from 119 participants completed an ophthalmologist examination and fundus photography. Of these, 222 photo montages were judged to have adequate image clarity and coverage of the optic disk and macula by a majority of photo graders and were included in the analysis. On the reference standard ophthalmologist examination, 19 of 222 eyes were assessed as having a VCDR ≥0.6, and 97 eyes were diagnosed with DR—43 (19%) with nonproliferative DR and 54 (24%) with proliferative DR. The validity of photo grading is summarized in Table 1, which shows that agreement between each photo grader and the reference standard was greater for DR than VCDR, and that creating a majority consensus grade improved the agreement much more for VCDR than for DR assessment. The precision of photo grading is summarized in Table 2, which shows that inter-rater reproducibility between the 5 photo graders was significantly higher for DR (κ = 0.75, 95% CI 0.70–0.82) than glaucoma (κ = 0.40, 95% CI 0.28–0.52) (difference 0.35 higher for DR, 95% CI 0.21–0.50).
      Table 1Assessment of vertical cup-to-disk ratio and diabetic retinopathy by photo grading and agreement with in-person eye examination by an ophthalmologist
      Finding on photography
      Photo graderPresentAbsentCannot determineAgreement with eye exam, Cohen's κ (95% CI)
      From 3 × 3 contingency table (present, absent, cannot determine); 95% bootstrap CIs were resampled at the person level to account for correlation of eyes from the same person.
      Any DR
       19212460.91 (0.86–0.98)
       26215910.70 (0.58–0.81)
       310012200.94 (0.89–0.98)
       410212000.86 (0.80–0.93)
       59712320.87 (0.81–0.94)
       Consensus
      Results are also shown for the consensus grade, classified as that upon which at least 3 of the 5 graders agreed.
      9312900.89 (0.81–0.95)
      VCDR ≥0.6
       12319810.56 (0.34–0.74)
       21120830.38 (0.15–0.58)
       31919850.47 (0.24–0.66)
       42919300.58 (0.34–0.77)
       514189190.35 (0.15–0.53)
       Consensus
      Results are also shown for the consensus grade, classified as that upon which at least 3 of the 5 graders agreed.
      1820400.73 (0.52–0.90)
      CI, confidence interval; DR, diabetic retinopathy; VCDR, vertical cup-to-disk ratio.
      low asterisk Results are also shown for the consensus grade, classified as that upon which at least 3 of the 5 graders agreed.
      From 3 × 3 contingency table (present, absent, cannot determine); 95% bootstrap CIs were resampled at the person level to account for correlation of eyes from the same person.
      Table 2Inter-rater agreement between 5 photo graders for retinopathy findings and cup-to-disk ratio thresholds
      ClassificationNumber
      Number of photographs for which a majority of graders judged the finding to be present.
      Cohen's κ (95% CI)
      From 3 × 3 contingency table (present, absent, cannot determine); 95% bootstrap CIs were resampled at the person level to account for correlation of eyes from the same person.
      Any retinopathy feature930.75 (0.69–0.81)
       Microaneurysms800.65 (0.59–0.71)
       Cotton-wool Spots130.46 (0.29–0.59)
       Intraretinal hemorrhage830.44 (0.33–0.56)
       Hard exudates410.57 (0.46–0.68)
       Neovascularization of the disk70.43 (0.24–0.60)
       Neovascularization elsewhere80.42 (0.29–0.52)
       Fibrous proliferation of the disk150.73 (0.57–0.85)
       Fibrous proliferation elsewhere180.63 (0.46–0.75)
       Preretinal hemorrhage60.59 (0.26–0.76)
       Vitreous hemorrhage30.24 (0.10–0.35)
      Glaucoma
       VCDR ≥0.6180.40 (0.27–0.52)
       VCDR ≥0.740.21 (0.07–0.32)
      CI, confidence interval; VCDR, vertical cup-to-disk ratio.
      low asterisk Number of photographs for which a majority of graders judged the finding to be present.
      From 3 × 3 contingency table (present, absent, cannot determine); 95% bootstrap CIs were resampled at the person level to account for correlation of eyes from the same person.
      In this study, inter-rater reproducibility was higher for DR assessment compared with VCDR assessment, and photo grades for DR were a more valid indicator of disease (i.e., better agreement with reference standard). The study was conducted using a single digital camera, and so it is unclear whether these results are generalizable to lower-resolution imaging systems such as the ultra-widefield scanning ophthalmoscope. Nonetheless, these results suggest that automated algorithms for DR could be trained on fundus photographs graded by even a single person, whereas algorithms for VCDR would benefit from grading based on a consensus of multiple graders.

      Footnotes and Disclosure

      The authors have no proprietary or commercial interest in any materials discussed in this article.

      Acknowledgements

      This study was supported by the JaMel Perkins Family Foundation, the Fortisure Foundation, That Man May See, Research to Prevent Blindness, the Littlefield Trust, the Peierls Foundation, the Doris Duke Charitable Foundation, and the University of California, Berkeley, Blum Center for Developing Economies. Shyu was supported by an award from the Office of Medical Student Research at Vanderbilt University School of Medicine. Yen and Snyder were Doris Duke International Clinical Research Fellows.

      Appendix. Supplementary materials

      References

        • Piyasena M.
        • Murthy G.V.S.
        • Yip J.L.Y.
        • et al.
        Systematic review on barriers and enablers for access to diabetic retinopathy screening services in different income settings.
        PLoS One. 2019; 14e0198979
        • Gupta V.
        • Bansal R.
        • Gupta A.
        • Bhansali A.
        Sensitivity and specificity of nonmydriatic digital imaging in screening diabetic retinopathy in Indian eyes.
        Indian J Ophthalmol. 2014; 62: 851-856
        • Lichter P.R..
        Variability of expert observers in evaluating the optic disc.
        Trans Am Ophthalmol Soc. 1976; 74: 532-572
        • Olson J.A.
        • Strachan F.M.
        • Hipwell J.H.
        • et al.
        A comparative evaluation of digital imaging, retinal photography and optometrist examination in screening for diabetic retinopathy.
        Diabet Med. 2003; 20: 528-534
        • Wilkinson C.P.
        • Ferris 3rd, F.L.
        • Klein R.E.
        • et al.
        Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales.
        Ophthalmology. 2003; 110: 1677-1682
        • Shiba T.
        • Yamamoto T.
        • Seki U.
        • et al.
        Screening and follow-up of diabetic retinopathy using a new mosaic 9-field fundus photography system.
        Diabetes Res Clin Pract. 2002; 55: 49-59