Overdiagnosis: epidemiologic concepts and estimation
Article information
Abstract
Overdiagnosis of thyroid cancer was propounded regarding the rapidly increasing incidence in South Korea. Overdiagnosis is defined as ‘the detection of cancers that would never have been found were it not for the screening test’, and may be an extreme form of lead bias due to indolent cancers, as is inevitable when conducting a cancer screening programme. Because it is solely an epidemiological concept, it can be estimated indirectly by phenomena such as a lack of compensatory drop in post-screening periods, or discrepancies between incidence and mortality. The erstwhile trials for quantifying the overdiagnosis in screening mammography were reviewed in order to secure the data needed to establish its prevalence in South Korea.
INTRODUCTION
According to the 2011 National Cancer Statistics in South Korea (hereafter Korea), thyroid cancer showed the highest annual percentage change (APC) at 23.3% [1]. Such an epidemic was suggested to be caused by overdiagnosis [2,3]. However, if the definition of overdiagnosis is not shared accurately, it can turn into a medical ethics issue that may lead to healthcare providers performing unnecessary treatment. Moreover, when the extent of overdiagnosis is not quantified, the public health policy implications cannot be properly addressed.
An overdiagnosis controversy already exists with respect to prostate, breast, renal, and thyroid cancer in Western countries [4], expressed as ‘an epidemic of diagnosis’ [5]. There has been suspicion of overdiagnosis, particularly on performing mammography for the early diagnosis of breast cancer, which has been performed since the 1980s [6]. This has greatly impacted the medical field after 2000 [7], including diverse studies focused on estimating the extent of overdiagnosis [8,9].
Therefore, the purpose of this study is to examine the implications of the definition of overdiagnosis, focusing on papers regarding mammography overdiagnosis. The study also aims to organize quantification methods to estimate the extent of overdiagnosis, as well as qualification criteria to confirm its presence. Based on this, another goal of the present study was to check the feasibility of quantification methods in Korean conditions and to further investigate which data should be secured in the future. Such attempts are expected to provoke evidence-based medical discussion regarding the overdiagnosis of thyroid cancer, which is a current controversy.
MAIN BODY
Change pattern in cancer incidence according to cancer screening
When a cancer screening program for the purpose of decreasing mortality via early diagnosis is newly initiated or more precise diagnostic devices are introduced, the incidence of corresponding cancer is increased [10], but the change pattern differs by the time of screening [11,12]. At the beginning of screening, incident cases as well as prevalent cases at lead time are detected and diagnosed, thereby exhibiting a rapid increase in incidence [13-16]. As the screening program is continued, only incident cases are detected, with the incidence manifesting as a stable increase [17,18]. Once a screening program is not provided, as with subjects approaching old age, it shows a complementary drop phenomenon whereby the incidence suddenly decreases (because the incident cases at lead time have already been diagnosed), and then converges to the level of non-screening group incidence [9,19-21]. Biesheuvel et al. [9] schematized such a change pattern.
Meanwhile, change patterns vary depending on incidence levels and trends, implementation extent and accuracy of screening methods, and extent of overdiagnosis [22,23]. Accordingly, a cancer screening program inevitably results in overdiagnosis [24-28]. Therefore, the increase in incidence should not be simplified as an outcome of overdiagnosis [29].
Definition of overdiagnosis
Table 1 shows sentences with definitions found in papers relevant to the overdiagnosis of screening mammography [4,9,16, 20,24,25,28-44]. Tracing back the year of publication, they are based on the definition by Prorok et al. [30] in 1999, which is ‘the detection of cancers that would never have been found were it not for the screening test.’
However, there are a couple of tiny differences. First, detection was used interchangeably with diagnosis [45,46] because the term ‘overdetection’ implies that it is the result of implementing unnecessary screening tests [9,47]. However, the definitions in Table 1 need to be differentiated by whether an object of detection or diagnosis is cancer or lesion. Although Zahl et al. [32] and Day [33] highlighted lesion, the remainder was predominantly specified as cancer, and cancer diagnosis should always accompany patho-histological judgments. Based on these, the term overdiagnosis seems more reasonable than overdetection. Second is the distinction as to whether it includes invasive cancer as well as carcinoma in situ. Zahl et al. [32] stated these cases as ‘low malignancy lesions’, whereas Moss [34] and Seigneurin et al. [39] emphasized the inclusion of invasive cancer. As the progression of diagnosed cancer cells can be different [4,48,49] and some cases undergo regression [50-53], it is reasonable to include invasive cancers as well [44,48]. Third, regarding conditions that are only detected with screening tools, Seigneurin et al. [39] emphasized their ‘non-progressive’ nature, and Cervera et al. [28] interpreted them as ‘not having resulted in morbimortality’. As final results of subjects diagnosed by screening could be identified after death [43], it is therefore difficult to develop methods for overdiagnosis estimation [4,29,54,55]. Fourth, there is a claim that the term ‘overtreatment’ should be used in lieu of overdiagnosis. Prior to the Prorok et al. [30] statement, Hurley and Kaldor [56] had already been using the term ‘overtreatment of abnormalities’ as a harm of screening mammography in 1992, and Gur and Sumkin [57] suggested using the term ‘overtreatment’ when considering society as a whole, as well as the medical field. However, even in cases of overdiagnosis, treatment may not be received according to one’s level of understanding of the disease and treatment preference based on potential gains or losses related to the outcome [58]. Thus, the term ‘overdiagnosis’ is more appropriate. However, overtreatment is a result of a preceding overdiagnosis [46], and overtreatment is obviously prevented once overdiagnosis is reduced [49].
Mechanism of overdiagnosis
Overdiagnosis occurs when non-progressive or regressive cancer is diagnosed, and when deaths are caused by other reasons [4,20,29,36]. Among them, the first indicates an improper control of the interference of length time bias, which is more frequently diagnosed for indolent cancer that shows better prognosis due to slower-than-expected progression where changes in mortality cannot be made, even without treatment [9,49,54, 59-61]. In other words, unknown future progression at the time of diagnosis is a major reason of overdiagnosis [8,20,33]. The occurrence of overdiagnosis according to various cancer progressions was schematized by Welch and Black [4] and Zahl et al. [24] (Figure 1). Due to this background, Esserman et al. [62] proposed the term ‘indolent lesion of epithelial origin (IDLE)’.
Judgment criteria for the presence of overdiagnosis and its estimation
When overdiagnosis occurs after the diagnosis of IDLE cancer, a complementary drop phenomenon with respect to incidence does not occur after screening termination [8-10,20,21, 63]; moreover, mortality does not change substantially compared to an increase in incidence [4]. The presence of overdiagnosis is determined based on these two phenomena (Table 2), and these form the base of study by Ahn et al. [2] that claims overdiagnosis as the main cause for the epidemic in thyroid cancer among Koreans.
Meanwhile, the definition of overdiagnosis is based on epidemiologic deduction [29] and therefore, the extent of overdiagnosis cannot be directly estimated unless cancer progression can be known in advance at the current medical level [4,10,34,53, 55]. Still, a best guess can be made in the presence of long-term follow-up data with randomized controlled trials [4,25,64,65]. To more conveniently explain the estimation of the extent of overdiagnosis, contents described by Welch and Black [4] are schematized in Figure 2. Cancer identified in a screening group can be classified into cases diagnosed by screening (B) and cases diagnosed by clinical symptom appearance (As). Cancer detected in the control group that did not receive screening can be divided into cases diagnosed by clinical symptom appearance (An) and cases diagnosed after the screening completion (C). The difference between screening group and non-screening group, d (=As+B–An-C), is the extent of overdiagnosis, and the percentage (%) of d with denominator of the cases diagnosed by screening in screening group (B) is the overdiagnosis proportion. While cases diagnosed in the screening group after the screening termination are not further reflected, catch-up cases in the control group (C) are added in order to differentiate over-incident cases by lead time bias and over-incident cases by overdiagnosis [34, 65-67]. The Malmö study had 15 years of additional observation after study completion [67]. If the incidence of the screening group is not reduced after screening termination and remains high compared to the non- screening group, a longer follow-up period is necessary [8,21,68].
Instead of d that can be obtained by randomized tests (Figure 2), the extent of overdiagnosis in an observational study is considered as the difference between the observed cumulative incidence in the screening group and the expected cumulative incidence in the control group [15,18,42,43,63,69,70]. However, the following four major items should be reviewed in the study plan to secure the validity of the estimated outcomes [9,20]. First, the selection of the non-screening control group should be assessed [27,43,55]. Randomized trials can secure an assumption that the risk of cancer incidence is at the same level irrespective of control or screening group [38,42,67], whereas the best control group that satisfies this assumption should be selected in observational studies [71]. The non-screening group cohort is established based on screening through a prospective cohort study [72], or a historical national control group is proposed via retrospective cohort study prior to the introduction of the screening program [9,10,27,73]. Second, the validity of the methods used to estimate the predicted incidence in the control group should be determined [74,75]. Longstanding incidence rates are needed; therefore, it is important and necessary to have modeling that actively reflects the incidence trend via age-period-cohort (APC) analysis [15,18,39,65,69,75,76]. Third, the conceptual determination of lead time bias [24,77], acceptability of their control methods [4,8,21,53,72,73,78], and confinement of follow-up period ranges [9,10,43,53] are important to this process. Fourth, the possibility of obtaining information to differentiate As and B in Figure 2 for screening group incident cases is important [29]. Although it is easy to obtain information when population-based cancer registry information is utilized in a prospective cohort study [72], survey outcomes performed for other purposes can be derived in a retrospective cohort study [55].
Therefore, observational studies have a variety of estimation methods according to types of control group selection, estimation methods of expected incidence, and lead time control methods [8,20,32,42,43,79]. Biesheuvel et al. [9] classified these diverse study methods into incidence rate, cumulative incidence, and modeling with respect to utilized epidemiologic information. Based on this classification, estimating procedures of the expected incidence in control group, roles of the National Cancer Registry Program, and additionally requested information are summarized in Table 3 in order to identify which studies are plausible in domestic conditions and type of information required in the future. Utilization of cumulative incidence rather than incidence rate is primarily recommended [9,20], and study design differs depending on how to estimate the expected incidence of the control group in Table 3, while the National Cancer Registry Program plays an important role in all cases. Statistical modeling that can reflect the natural history of diseases [9,69] is possible only when such information, including interval cancer [69,80,81] is secured [22,35,39,53]. Considering these and domestic conditions lacking randomized trials for cancer screening, a cohort analysis of cumulative incidence can be performed first if it is known whether the diagnosis was made by screening or was made due to clinical findings [34,72]. If this is challenging, comparative analyses on cumulative incidences between pre- and post-program of cancer screening can be conducted. However, considering that National Cancer Statistics information is available from 2002, estimation methods applying APC can be implemented [55,82].
CONCLUSION AND SUGGESTION
Estimating the extent of overdiagnosis is a method of investigating the extent of the harms and benefits of screening [38,55], and this is an important evidence for determining public health policies [20,43]. Considering current conditions in Korea, the extent of overdiagnosis can be estimated by establishing a strategy that utilizes statistical results from the Korea Central Cancer Registry database and individual follow-up data as much as possible [65,83]. The barriers are that information regarding whether a diagnosis was made by screening has been collected only in recent years [29], and official statistics have been published since 2002 [34]. To overcome these limitations, national research institutes should promote and lead relevant studies in the optimal use of collectable information.
Despite efforts to validate the utilizing data, there are still limitations left to be addressed regarding the correct interpretation of the estimated extent of overdiagnosis [59,75]. For example, the extent of incidence increase can be differently interpreted according to screening participation rate [43], sensitivity may alter depending on the introduction of new screening methods [54], and risk factors of breast cancer incidence can change (e.g., an increase of oral hormone administration or a change in maternity nursing history) [11,55], and effective treatment methods can be newly developed [55]. Because of these limitations, overdiagnosis estimation ranges of screening mammography are very broad [9,20], and thus there is still controversy on the overdiagnosis [43]. Although a checklist was proposed for the appraisal of related studies [43], an ultimate solution for this controversy is to overcome length time bias causing overdiagnosis [49,54,59-61]. It is therefore necessary to identify the percentage of cancer that has not progressed to invasive cancer in carcinoma in situ [84], the IDLE proportion of overall diagnostic cancer [62], and the percentage of autopsy-based diagnosis [85]. In addition, the Korea Central Cancer Registry has collected information of cancer stage in Surveillance, Epidemiology, and End Results (SEER) from 2009 incident cases [1,82], and it is necessary to obtain follow-up results according to the SEER stage regarding thyroid, breast, and prostate cancers, which have been currently controversial in overdiagnosis. Lastly, given the fact that subjects who are negatively impacted from overdiagnosis are healthy individuals [4,27], decision aids allowing for the understanding of losses and gains of correct screening need to be simultaneously developed [58,86].
Acknowledgements
This study was supported by 2013 cancer research support project from the Korea Foundation for Cancer Research (no. 2013-2).
Notes
The author has no conflicts of interest to declare for this study.
SUPPLEMENTARY MATERIAL
Supplementary material is available at http://www.e-epih.org/.