Warning: fopen(/home/virtual/epih/journal/upload/ip_log/ip_log_2024-10.txt): failed to open stream: Permission denied in /home/virtual/lib/view_data.php on line 95 Warning: fwrite() expects parameter 1 to be resource, boolean given in /home/virtual/lib/view_data.php on line 96 Calculation of smoking rates by dong/eup/myeon unit using small-area estimation in the Community Health Survey

Calculation of smoking rates by dong/eup/myeon unit using small-area estimation in the Community Health Survey

Article information

Epidemiol Health. 2015;37.e2015013
Publication date (electronic) : 2015 March 2
doi : https://doi.org/10.4178/epih/e2015013
1Gallup Korea, Seoul, Korea
2Department of Applied Statistics, Hanshin University, Osan, Korea
3Division of Chronic Disease Control, Korea Centers for Disease Control and Prevention, Cheongju, Korea
Correspondence: Kay O Lee  Gallup Korea, 70 Sajik-ro, Jongno-gu, Seoul 110-054, Korea  Tel: +82-2-3702-2582, Fax: +82-2-3702-2628, Email: kolee575@hanmail.net
Received 2015 January 22; Accepted 2015 March 2.

INTRODUCTION

The Korean Community Health Survey (CHS), a community-based nationwide annual survey with the objective of providing important health indicators, is conducted through stratified cluster sampling and computer-assisted personal interviewing. Using the dong/eup/myeon administrative units (hereafter units) and residential structures (apartment or single house) as stratification variables, 900 adults (age≥19 years) per community health center district (hereafter district) are sampled and proportionally distributed across the units and according to residential structures, followed by selecting tong/ban/ri-level sample points via probability proportionate sampling based on the number of households. From each selected sample point, five households on average are selected by systematic sampling, and individual interviews are conducted with all adults in each household [1]. Although district-level health indicators are produced with a specific level of precision, there is an increasing demand for producing unit-level health indicators. However, the unit sample size varies considerably ranging between tens and several hundreds, and health indicators such as smoking rate for units with fewer than 30 samples produced using conventional statistical estimation methods cannot be used, owing to an exceedingly large sample variance of the estimates. This problem can be addressed by calculating unit-level statistics using a special estimation method such as small-area estimation [2]. This paper presents an optimized estimation method using Statistical Analysis System (SAS) codes.

Small-area estimation

Small-area estimation is an estimation method designed to produce statistics for small survey areas not included in the sample design for statistics and having unusable high-variance estimates owing to excessively small sample sizes, through supplementary use of surrounding area survey information, auxiliary information from other sources, or statistical model of the population [3,4].

Given that the sample design for the Community Health Service is intended to produce district-level health indicators, small-area estimation is needed for producing reliable unit-level health indicators. The following describes the small-area estimation methods for producing unit-level health indicators [5].

Direct estimator

A direct estimator uses only data obtained from the units concerned to produce unit-level health indicators. Each observation included in the survey data sets is given a weight item by item; the sample design-based direct estimator and its variance can be expressed by the following estimation equation, using weighted and observed values:

(1) Y¯Di=j=1niwjiyjij=1niwji

where ni is the sample size of unit i, Wji is the multiplier reflecting sample and response rates, and yji is the observed value.

The variance of the estimator shown in Equation (1) can be obtained using Equation (2), as follows:

(2) VarY¯Di=1nini-1w¯i2j=1niwji2yji-Rt2

Where Rt=j=1niwjiyjij=1niwji and wi=1nij=1niwji

Synthetic estimator

A synthetic estimator can yield more accurate unit-level estimates by using various auxiliary data, such as sex, age, and registered population, for each unit within a district [4,5]. After grouping the units into 2 to 3 homogeneous clusters, using their social environments and population ratios as clustering variables, under the assumption that the units of the same cluster might have similar sex-dependent/age-dependent health indicators, such as smoking rate, unit-level smoking rates can be calculated as follows, by combining smoking rates by sex and age group and by the number of people registered:

(3) Y¯Si=k=1ncrjkNjkij=1ncNjki

Where rjk=l1ngwjklyjkll=1ngwjkl is the average estimate of category k within cluster j, Njki is unit i within cluster j, registered population of category k, ng is the number of samples in category k within cluster j, and nc is the number of categories within cluster j.

The variance of the estimate shown in Equation (3) can be obtained using Equation (4):

(4) Var^Y̅^Si=k=1ncZjki2Var^rjki=k=1ncZjki2ngng-1w̅jk2l=1ngwjklyjkl-rjk2

where zjki=Njkik=1ncNjki

While a synthetic estimator is potentially biased, the bias is considered negligible, given that the clustering of units within the same district ensures the sex/age category characteristics across units. If the bias exceeds a negligible level, attention must be paid to the clustering, because the equation for estimating the variance shown in Equation (4) tends to underestimate the estimation error of the estimator of Equation (3).

Composite estimator

Although the direct estimator shown in Equation (1) is unbiased, its estimates are not reliable, with unacceptably large standard error resulting from the small sample size, whereas the synthetic estimator shown in Equation (3) is biased. These problems can be addressed using a hybrid approach of obtaining more reliable estimates from the weighted average of the two estimators. The formula of weighted estimate average shown in Equations (1) and (3) is referred to as a composite estimator and can be calculated as follows:

(5) Y̅^Ci=αY̅^Di+(1-α)YSi̅^

where α is the weight that minimizes MSE(YCi̅^) and can be calculated as follows:

(6) α= Var ^(Y̅^Si)Var^Y̅^Di+Var^Y̅^Si

While the optimal value of α is expected to be the one that minimizes the root mean square error of Y̅^Ci , α is calculated using Equation (6) under the assumption that clustered units are sufficiently homogeneous to ignore the bias of the synthetic estimator Y̅^Si and that the direct and synthetic estimators are independent of each other.

Calculation of smoking rate using statistical analysis system

The following describes the calculation process of the unit-level smoking rates using the 2013 CHS data by applying small-area estimation to the calculation of smoking rates for the 22 dongs in Gangnam-gu, Seoul.

Direct estimator

The sample size distribution of the 22 dongs of Gangnam-gu included in the 2013 CHS shows that the dongs with the smallest and largest sample sizes are Gaepo 4- dong (n=24) and Yeoksam 1-dong (n=58). The smoking rates by dong were calculated using the direct estimators shown in Equation (1) and the variance estimation Equation (2) using SAS code as follows. The R-code program was presented in a previous study [6].

/*Generating the variables age groups and smoking/non-smoking from the Gangnam health center data*/
data abc.seoul_gangnam_data;
   set abc.chs13;
   length age_group $8.0
   keep josa_year dong sm_a0100 sma_01z2 sma_03z2 age age_group sex wt;
   rename dong=eup/myeon/dong;;\
   if 19<=age<=39 then age_group=”19-39 years”;
   if 40<=age<=59 then age_group=”40-59 year”;
   if 60<=age then age_group=”60 years and over”;
** Current smoking rate (refer to [7] for other survey item variables)
=========================
   Variable name: sm_a0100 (current smoking rate calculation variable)
   Analysis data variable name: sma_01z2(permanent smoking or not)
sma_03z2(current smoking or not)
========================;
   if sma_01z2 = 1 then do;
   if sma_03z2 in (1,2) then sm_a0100=1;
   else if sma_03z2 = 3 then sm_a0100=0;
   end ;
   else if sma_01z2 = 2 then do;
      sm_a0100 = 0;
   end ;
   if bogun_cd=001; (Gangnam-gu health center code)
run;
/*Direct estimator and variance estimation of Gangnam-gu, Seoul*/
proc survey means data=abc.seoul_gangnam_data;
   var sm_a0100; (current smoking rate calculation variable)
   domain eup/myeon/dong;
   weight wt; (sample design weight)
   ods output Domain=abc.direct_estimator;
run;

Synthetic estimator

To calculate the synthetic estimators of smoking rates by dong, the 22 dongs of Gangnam-gu were grouped into three clusters. After calculating the smoking rates of each cluster by sex and age group (19 to 39, 40 to 59, ≥60), the smoking rates and variances by dong were calculated using Equations (3) and (4), respectively, as follows:

  • ① Calculation of the registered population ratios by sex and age group of 22 dongs as of the end of July 2013

  • ② Grouping 22 dongs in three clusters using the k-means method, with the population ratio and smoking rate as clustering variables, and with a homogeneity check of the clustered groups against the registered population size and socioeconomic environment

/*Cluster analysis according to the smoking rates of population ratios by sex and age group in Gangnam-gu, Seoul as of 2013*/
/*Data retrieval*/

proc import out=abc.seoul_gangnam_cluster_
data datafile=”D:
\2014_research_activities\sas_
lecture\seoul_gangnam-gu_cluster_data”
datafile=dbms=excel replace; rage=”Seoul$”
getnames=yes; mixed=no; scantext=yes;
usedate=yes; scantime=yes;
 run;
 Imported data set is given Table 1.
/*k-average clustering*/
proc fastclus data=abc.seoul_gangnam_cluster_data
   maxc=3 out=abc.seoul_gangnam_kcluster;
   var male_2013_19_39_years
male_2013_40_59_years male_2013_60_over female_2013_19_39_years female_2013_40_59_years
female_2013_60_over;
  id eup/myeon/dong;
run;

/*Export clusters to Excel*/
proc exprot data=abc.seoul_gangnam_kcluster
OUTFILE= ”D:
\2014research_activities\sas_
lecture\Seoul_gangnam-gu_cluster_analysis” label dbms=excel replace;
run;

/*Cluster correction*/
proc import out=abc.seoul_gangnam_rcluster datafile=”D:\2014research_activities\sas_
lecture\Seoul_gangnam-gu_corrected_clustering”
dmbs=excel replace; rage=”seoul$”
getnames=yes;
mixed=no; scantext=yes; usedate=yes;
scantime=yes;
run;

③ Smoking rates of each cluster by sex and age group

/*Corrected cluster integration in data from Gangnam-gu, Seoul*/
proc sort data=abc.seoul_gangnam_data; by eup/ myeon/dong; run;
proc sort data=abc.seoul_gangnam_rcluster; by eup/myeon/dong; run;
data abc.seoul_gangnam_data2;
   merge abc.seoul_gangnam_data abc.seoul_gangnam_rcluster;
   by eup/myeon/dong;
   group1=corrected_cluster||”_”||sex||”_”||age_group;
group1=compress(group1);
run;
proc print data=abc.seoul_gangnam_data2; run;
The corrected data set is displayed in Table 2.
/*Calculation of the estimated smoking rates of each cluster by sex and age group*/
proc surveymeans data=abc.seoul_gangnam_data2 mean;
   var sm_a0100;
   domain group1;
   weight wt;
   ods output Domain=abc.com_estimator_r;
run;
The results of calculation are contained in Table 3.

④ Calculation of synthetic estimates and variances of the smoking rates in 22 dongs

/*Preparation of the data for the cluster-level sex/age-dependent smoking rate estimation*/
/*Corrected cluster integration in data from Gangnam-gu, Seoul*/
proc sort data=abc.seoul_gangnam_data; by eup/myeon/dong; run;
proc sort data=abc.seoul_gangnam_rcluster; by eup/myeon/dong; run;
data abc.seoul_gangnam_data2;
   merge abc.seoul_gangnam_data abc.seoul_gangnam_rcluster;
   by eup/myeon/dong;
   group1=corrected cluster||”_”||sex||”_”||age_group;
   group1=compress(group1);
run;
/*Synthetic estimator for the 22 dongs in Gangnam-gu, Seoul*/
proc surveymeans data=abc.seoul_gangnam_com mean;
   domain eup/myeon/dong;
   var mean;
   weighted population;
   ods output domain=abc.seoul_gangnam_comestimator_mean;
run;
Synthetic estimates are displayed in Table 4.
/*Variance estimation of the synthetic estimators for each dong*/
proc sort data=abc.seoul_gangnam_data2; by group1; run;
data abc.seoul_gangnam_comvar;
   merge abc.seoul_gangnam_data2 abc.com_estimator_r;
   by group1;
   sum_wj_yj_rj=((wt*wt)*((sm_a0100-mean)*(sm_a0100-mean)));
run;
/*Total weights of the k-category within a cluster*/
proc surveymeans data=abc.seoul_gangnam_comvar;
   domain group1;
   var wt;
   ods output domain=abc.seoul_gangnam_comvar_1;
run;

/*Total for the intragroup k-category variance estimation*/
proc tabulate data=abc.seoul_gangnam_comvar;
   class group1;
   var sum_wj_yj_rj;
   table group1*sum_wj_yj_rj;
   ods output table=abc.seoul_gangnam_comvar_2;
run;
data abc.seoul_gangnam_comvar_data1;
   merge abc.seoul_gangnam_pop abc.seoul_gangnam_comvar_1 abc.seoul_gangnam_comvar_2;
   by group1;
   keep eup/myeon/dong cluster sex age-group population N Mean sum_wj_yj_rj_sum group1;
run;
Table 5 contains the generated data set.
proc surveymeans data=abc.seoul_gangnam_comvar_data1 sum;
   domain eup/myeon/dong;
   var population;
   ods output domain=abc.seoul_gangnam_comvar_data2;
run;
proc sort data=abc.seoul_gangnam_comvar_data1; by eup/myeon/dong; run;
proc sort data=abc.seoul_gangnam_comvar_data2; by eup/myeon/dong; run;
/*Synthetic estimator variance estimation*/
data abc.seoul_gangnam_comvar_data;
   merge abc.seoul_gangnam_comvar_data1
   abc.seoul_gangnam_comvar_data2;
   by eup/myeon/dong;
   drop varname varlabel stddev DomainLabel;
   Zjk=Population/Sum;
   Var=((Zjk*Zjk)/((N*(N-1))*(mean*mean)))*sum_wj_yj_rj_sum;
run;
Table 6 contains variance of syntyetic estimate.
proc surveymeans data=abc.seoul_gangnam_comvar_data sum;
   domain eup/myeon/dong;
   var var;
   ods output domain=abc.seoul_gangnam_comvariance;
run;

⑤ Data set generation by integrating the synthetic estimates and variance estimates for the smoking rates in all 22 dongs

/*Data integration of dong-level synthetic estimates and variance estimates*/
/*Synthetic estimator and variance*/
data abc.seoul_gangnam_estimator_com;
   merge abc.seoul_gangnam_comestimator_mean
abc.seoul_gangnam_comvariance;
   by eup/myeon/dong;
   drop DomainLabel VarName stderr StdDev;
   rename mean=Y_s sum=var_Y_s;
run;

Composite estimator

The dong-level direct and synthetic estimates of smoking rates are combined as weighted averages to calculate the composite estimates, thereby applying the following three methods for calculating weighted values.

First, αi for minimizing the mean square error of the composite estimator MSE(YCi̅^) hown in Equation (5) is expressed as follows:

(7) αi(opt)=MSE(Y̅^Si)MSEY̅^Si+V(Y̅^Di)

The estimated value of the optimal weight αi(opt) is calculated as follows:

(8) α^i(opt)=MSE(Y̅^Si)Y̅^Si-Yi̅^2

The weight minimizing the mean MSE(Y̅^Si) using the initial common weight α as an approach to giving the common weight to all small areas is obtained as follows:

(9) α^opt=1-iV^Y̅^DiiY̅^Si-Y̅^i2

The weight dependent on the sample size assigned to each small area is calculated as follows:

(10) αiδ=1,N^iδNiN~iδNi(an so on)

where N_i is the size of small area N^i=N(nin). N~i is the direct estimator, and δ is the value adjusting the synthetic estimator’s contribution, thus a subjectively determined value. The Canadian Labor Force Survey, for example, uses δ=2/3, which is also applied to this calculation [8]

After calculating the composite estimators of the dong-level smoking rates with each of the three types of weight presented above, an optimal estimation method is selected. The composite estimates are calculated in the following procedure.

① Data set generation integrating the direct estimates (Y_d) and synthetic estimates (Y_s) calculated for the 22 dongs

/Integration of direct and synthetic estimators (estimates and variance)*/
data abc.seoul_gangnam_estimators;
   merge abc.seoul_gangnam_estimator_direct abc.seoul_gangnam_estimator_com;
by eup/myeon/dong;
run;
Table 7 contains the direct and synthetic estimates.

② Calculation of composite estimates using the first weight (Y_c1)

/*Composite estimator_1*/
data abc.seoul_gangnam_estimator_c1;
   set abc.seoul_gangnam_estimators;
   alpha1=Var_Y_s/(Var_Y_d+Var_Y_s);
   Y_c1=(alpha1*Y_d)+((1-alpha1)*Y_s);
Var_Y_c1=((alpha1*alpha1)*Var_y_d)+(((1-alpha1)*(1-alpha1))*var_Y_s);
   sumvar_Ys_Yd=(var_Y_s+var_Y_d);
run;
/*Direct estimator variance by corrected cluster and direct estimator variance + composite estimator*/

proc surveymeans data=abc.seoul_gangnam_estimator_c1 sum;
domain corrected cluster;
var Var_Y_d;
ods output domain=abc1;
run;
data abc1;
   set abc1;
   rename sum=sum1;
run;
proc surveymeans data=abc.seoul_gangnam_estimator_c1 sum;
   domain corrected cluster;
   var sumvar_Ys_Yd;
   ods output domain=abc2;
run;
data abc3;
   merge abc1 abc2;
   by corrected cluster;
   alpha2=1-(sum1/sum);
   keep corrected cluster alpha2;
run;
proc sort data=abc.seoul_gangnam_estimator_c1; by corrected cluster; run;

③ Calculation of composite estimates using the second weight (Y_c2)

/*Composite estimator_2*/
data abc.seoul_gangnam_estimator_c2;
   merge abc.seoul_gangnam_estimator_c1 abc3;
   by corrected cluster;
   Y_c2=(alpha2*Y_d)+((1-alpha2)*Y_s);
Var_Y_c2=((alpha2*alpha2)*Var_y_d)+(((1-alpha2)*(1-alpha2))*var_Y_s);
run;
proc surveymeans data=abc.seoul_gangnam_pop sum; domain eup/myeon/dong; var population; ods output domain=abc4; run;
proc surveymeans data=abc.seoul_gangnam_pop sum; domain corrected cluster; var population; ods output domain=abc5; run;
proc surveymeans data=abc.seoul_gangnam_estimator_c2 sum; domain corrected cluster; var N; ods output domain=abc6; run;
proc sort data=abc4; by eup/myeon/dong; run;
data abc5; set abc5; rename Sum=cluster population; run;
proc sort data=abc5; by corrected cluster; run;
data abc6; set abc6; rename Sum=cluster sample size; run;
proc sort data=abc6; by corrected cluster; run;
proc sort data=abc.seoul_gangnam_estimator_c2; by eup/myeon/dong; run;
data abc.seoul_gangnam_estimator_c3_1;
m   erge abc.seoul_gangnam_estimator_c2 abc4;
   by eup/myeon/dong;
   drop DomainLabel VarName VarLabel StdDev;
   rename sum=registered number of population;
run;
proc sort data=abc.seoul_gangnam_estimator_
c3_1; by corrected cluster; run;
data abc.seoul_gangnam_estimator_c3_2;
   merge abc.seoul_gangnam_estimator_c3_1 abc5 abc6;
   by corrected cluster;
   drop DomainLabel VarName VarLabel StdDev;
run;

④ Calculation of composite estimates using the third weight (Y_c3)

/*Composite estimator_3*/
data abc.seoul_gangnam_estimator_c3;
   set abc.seoul_gangnam_estimator_c3_2;
   hat_N_i=cluster population*(N/cluster sample size);
   if hat_N_i>=((2/3)*number of registered population) then alpha3=1
   else alpha3=hat_N_i/((2/3)*number of registered population);
   Y_c3=(alpha3*Y_d)+((1-alpha3)*Y_s);
   Var_Y_c3=((alpha3*alpha3)*Var_y_d)+(((1-alpha3)*(1-alpha3))*var_Y_s);
run;

⑤ Calculation of the composite estimates using the average weight of the first and third weights (Y_c4)

/*Composite estimator_4*/
data abc.seoul_gangnam_estimator_c4;
   set abc.seoul_gangnam_estimator_c3;
   alpha4=(alpha1+alpha3)/2
   Y_c4=(alpha4*Y_d)+((1-alpha4)*Y_s);
   Var_Y_c4=((alpha4*alpha4)*Var_y_d)+(((1-alpha4)*(1-alpha4))*var_Y_s);
run;
/*Direct estimator_synthetic estimator_composite estimator*/
data abc.estimator_total;
   set abc.seoul_gangnam_estimator_c4;
   keep eup/myeon/dong Y_d Var_Y_d Y_s var_y_s alpha1 Y_c1 var_Y_c1 alpha3 Y_c3 var_y_c3 Y_c4 var_y_c4;
run;
proc print data=abc.estimator_total; run;

Table 8 outlines the results of calculating the current smoking rates of each of the 22 dongs applying the four types of composite estimates. While variances are found to be lower compared with direct estimates, the current smoking rate estimates vary widely depending on dong sample sizes, reflecting the estimate-stabilizing feature of composite estimation. Viewed from the aspect of reducing the variance of direct estimates and correcting the bias of synthetic estimates, the fourth composite estimator is considered most effective in stabilizing the dong-level estimates.

Results of estimating the current smoking rates by dong applying four types of composite estimate

CONCLUSION

This paper explained the procedure for producing health indicators at dong/eup/myeon-level (smaller than the designed domain) from the CHS data with the aim of producing health indicators at the level of health center district (designed domain), by applying small-area estimation to an example case of numerical estimation. The calculation procedures were presented using SAS codes, a method that is expected to be useful in cases in which statistics for smaller or detailed areas are to be produced from a survey conducted with the objective of producing statistics at a larger domain level.

Acknowledgements

This work was supported by the Research Program funded by the Korea Centers for Disease Control and Prevention (fund code 2014-P33001-00).

Notes

The author has no conflicts of interest to declare for this study.

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://www.e-epih.org/.

References

1. Korea Centers for Disease Control and Prevention. Sample design for 2013 National Community Health Survey and monitoring sample survey Cheongju: Korea Centers for Disease Control and Prevention; 2013. p. 9–11. (Korean).
2. Ghosh M, Rao JN. Small area estimation: an appraisal. Stat Sci 1994;9:55–93.
3. Lee KO, Chung YS. Small-area estimation: research report Daejeon: Statistics Korea; 2001. p. 70–76. (Korean).
4. Gonzalez Jr JF, Placek PJ, Scott C. Synthetic estimation in followback surveys at the National Center for Health Statistics New York: Springer; 1996. p. 16–27.
5. Lee KO. Application of small-area estimation for estimating the number of the unemployed by si/gun/gu unit. Korean J Appl Stat 2000;13:275–285. (Korean).
6. Korea Centers for Disease Control and Prevention. The study program (algorithm) to compute dong/eub/meon’s statistics in Community Health Survey Cheongju: Korea Centers for Disease Control and Prevention; 2014. p. 24–43. (Korean).
7. Korea Centers for Disease Control and Prevention. Community Health Survey raw data use guidelines Cheongju: Korea Centers for Disease Control and Prevention; 2012. p. 28–35. (Korean).
8. Singh MP, Gambino J, Mantel HJ. Issues and strategies for small-area data. Surv Methodol 1994;20:3–22.

Article information Continued

Table 1.

Registered population ratios by sex and age group for each dong

Dong Male
Female
19-39 40-59 ≥ 60 19-39 40-59 ≥ 60
Shinsa-dong 0.188 0.174 0.094 0.243 0.192 0.109
Nonhyeon 1-dong 0.241 0.149 0.066 0.309 0.155 0.080
Nonhyeon 2-dong 0.220 0.158 0.083 0.268 0.175 0.095
Apgujeong-dong 0.172 0.178 0.106 0.209 0.206 0.128
Cheongdam-dong 0.198 0.182 0.085 0.228 0.209 0.098
Samsung 1-dong 0.203 0.191 0.091 0.200 0.217 0.099
Samsung 2-dong 0.203 0.189 0.069 0.257 0.198 0.084
Daechi 1-dong 0.140 0.262 0.073 0.159 0.287 0.078
Daechi 2-dong 0.175 0.238 0.074 0.173 0.259 0.080
Daechi 4-dong 0.211 0.188 0.058 0.257 0.216 0.069
Yeoksam 1-dong 0.256 0.155 0.064 0.310 0.145 0.070
Yeoksam 2-dong 0.194 0.199 0.067 0.247 0.210 0.083
Dogok 1-dong 0.198 0.195 0.081 0.223 0.214 0.090
Dogok 2-dong 0.167 0.211 0.090 0.203 0.239 0.091
Gaepo 1-dong 0.193 0.193 0.087 0.189 0.234 0.105
Gaepo 2-dong 0.209 0.204 0.061 0.211 0.233 0.083
Gaepo 4-dong 0.207 0.204 0.070 0.208 0.226 0.085
Segok-dong 0.188 0.193 0.105 0.188 0.194 0.132
Ilwonbon-dong 0.182 0.226 0.065 0.195 0.253 0.079
Ilwon 1-dong 0.209 0.179 0.091 0.196 0.205 0.120
Ilwon 2-dong 0.205 0.189 0.077 0.193 0.221 0.114
Sooseo-dong 0.205 0.144 0.101 0.185 0.191 0.173

Table 2.

Examples of data structure, including corrected cluster, sex and age group, and variables for calculation smoking rate

Observation Eup/myeon/dong Sex sma_ 01z2 sma_ 03z2 Age wt Age_ group sma_ 0100 Cluster Corrected cluster Group1 (age)
1 Nonhyeon 1-dong M 1 1 26 592.15 19-39 1 2 1 1_1_19-39
2 Nonhyeon 1-dong M 1 1 36 647.70 19-39 1 2 1 1_1_19-39
3 Nonhyeon 1-dong M 1 1 35 647.46 19-39 1 2 1 1_1_19-39
4 Nonhyeon 1-dong M 1 1 25 592.15 19-39 1 2 1 1_1_19-39
5 Nonhyeon 1-dong M 2 - 19 592.15 19-39 0 2 1 1_1_19-39
6 Nonhyeon 1-dong M 2 - 32 647.46 19-39 0 2 1 1_1_19-39
7 Nonhyeon 1-dong M 1 1 32 647.46 19-39 1 2 1 1_1_19-39
8 Nonhyeon 1-dong M 1 1 39 647.46 19-39 1 2 1 1_1_19-39

sma, smoking variable; wt, weight; M, male.

Table 3.

Current smoking rate estimates and standard errors by group (cluster, sex/age)

Group (age) Variable Mean Standard error
1_1_19-39 sm_a0100 0.465 0.074
1_1_40-59 sm_a0100 0.474 0.076
1_1_≥60 sm_a0100 0.242 0.085
1_2_19-39 sm_a0100 0.168 0.056
1_2_40-59 sm_a0100 0.075 0.032
1_2_≥60 sm_a0100 0.032 0.031
2_1_19-39 sm_a0100 0.364 0.074
2_1_40-59 sm_a0100 0.419 0.062
2_1_≥60 sm_a0100 0.160 0.062
2_2_19-39 sm_a0100 0.137 0.048
2_2_40-59 sm_a0100 0.057 0.030
2_2_≥60 sm_a0100 0.0 0.0
3_1_19-39 sm_a0100 0.413 0.068
3_1_40-59 sm_a0100 0.317 0.065
3_1_≥60 sm_a0100 0.207 0.085
3_2_19-39 sm_a0100 0.065 0.038
3_2_40-59 sm_a0100 0.0 0.0
3_2_≥60 sm_a0100 0.134 0.056

Table 4.

Current smoking rate estimates and standard errors

Dong Mean Standard error
Gaepo 1-dong 0.185 0.071
Gaepo 2-dong 0.188 0.074
Gaepo 4-dong 0.190 0.073
Nonhyeon 1-dong 0.265 0.074
Nonhyeon 2-dong 0.259 0.073
Daechi 1-dong 0.211 0.076
Daechi 2-dong 0.214 0.072
Daechi 4-dong 0.213 0.066
Dogok 1-dong 0.210 0.066
Dogok 2-dong 0.205 0.068
Samsung 1-dong 0.260 0.077
Samsung 2-dong 0.261 0.075
Segok-dong 0.190 0.066
Sooseo-dong 0.187 0.066
Shinsa-dong 0.251 0.073
Apgujeong-dong 0.245 0.073
Yeoksam 1-dong 0.219 0.062
Yeoksam 2-dong 0.211 0.066
Ilwon 1-dong 0.190 0.070
Ilwon 2-dong 0.189 0.071
Ilwonbon-dong 0.184 0.073
Cheongdam-dong 0.256 0.075

Table 5.

Variance of synthetic estimates by group (cluster, sex/age group)

Ob servation Eup/myeooldong Corrected cluster Sex Agc PopulsSon Group1 (age) n Mean (wt) sum_wj_yj_rj_Sum
1 Shinsa-dong 1 M 19-39 3,071 1_1_19-39 47 629.36 4825692
2 Nonhyeon 1-dong 1 M 19-39 5,629 1_1_19-39 47 629.36 4825692
3 Nonhyeon 2-dong 1 M 19-40 4,267 1_1_19-39 47 629.36 4825692
4 Apgujeong-dong 1 M 19-39 4,014 1_1_19-39 47 629.36 4825692
5 Cheongdam-dong 1 M 19-39 5,060 1_1_19-39 47 629.36 4825692
6 Samsung 1-doog 1 M 19-39 2,577 1_1_19-39 47 629.36 4825692
7 Samsnng 2-doog 1 M 19-39 5,224 1_1_19-39 47 629.36 4825692
8 Shinsa-dong 1 M 40-59 2,841 1_1_40-59 48 470.96 2914779
9 Nonhyeon 1-dong 1 M 40-59 3,476 1_1_40-59 48 470.96 2914779
10 Nonhyeon 2-dong 1 M 40-59 3,061 1_1_40-59 48 470.96 2914779
11 Apgujeong-dong 1 M 40-59 4,153 1_1_40-59 48 470.96 2914779
12 Cheongdam-dong 1 M 40-59 4,659 1_1_40-59 48 470.96 2914779
13 Eamsnngl-dong 1 M 40-59 2,427 1_1_40-59 48 470.96 2914779
14 Samsung 2-dong 1 M 40-59 4,852 1_1_40-59 48 470.96 2914779
15 Shinsa-dong 1 M ≥ 60 1,527 1_1_≥ 60 27 414.07 898420.2
16 Nonhyeon 1-dong 1 M ≥ 60 1,540 1_1_≥ 60 27 414.07 898420.2
17 Nonhyeon 2-dong 1 M ≥ 60 1,613 1_1_≥ 60 27 414.07 898420.2
18 Apgujeong-dong 1 M ≥ 60 2,462 1_1_≥ 60 27 414.07 898420.2
19 Cheongdam-dong 1 M ≥ 60 2,179 1_1_≥ 60 27 414.07 898420.2

wt, weight; wj, weight; yj, observed value; rj, smoking rate; M, male.

Table 6.

Variance calculation of synthetic estimates by dong and group

Observation Eop/myaon/dong Ctorrootgnl vlustor Sex Age Population Group1 n Mean sum_wj_yj_rj_ Sum Sum Zjk Var
1 Gaepo 1-dong 3 M 19-39 3,677 3_1_19-39 54 615.277 5180109 19069 0.193 0.000178
2 Gaepo 1-dong 3 M 40-59 3,677 3_1_40-59 58 461.391 3071663 19069 0.193 0.000162
3 Gaepo 1-dong 3 M ≥60 1,659 3_1_≥60 29 406.868 1009267 19069 0.087 5.68E-05
4 Gaepo 1-dong 3 F 19-39 3,601 3_1_19-39 64 583.223 2031942 19069 0.189 5.28E-05
5 Gaepo 1-dong 3 F 40-59 4,455 3_1_40-59 78 419.521 0 19069 0.233 0.00
6 Gaepo 1-dong 3 n ≥60 2,030 3_1_≥60 37 411.813 728625.3 19069 0.105 3.55E-05
7 Gaepo 2-dong 3 M 19-39 5,704 3_1_19-39 54 615.267 5180109 27359 0.209 0.000209
8 Gaepo 2-dong 3 M 40-59 5,590 3_1_40-59 58 461.391 - 3071663 27359 0.204 0.000182
9 Gaepo 2-dong 3 M ≥60 1,656 3_1_≥60 29 446.828 1009267 27359 0.060 2.75E-05
10 Gaepo 2-dong 3 F 19-39 5,760 3_1_19-39 64 583.223 2031942 27359 0.210 6.57E-05
11 Gaepo 2-dong 3 F 40-59 6,368 3_1_40-59 78 419.521 0 27359 0.233 0.00
12 Gaepo 2-dong 3 F ≥60 3,247 3_1_≥60 37 481.813 728625.3 27359 0.083 2.21E-05

wj, weight; yj, observed value; rj, smoking rate; Zjk, proportion of registered population of category k; Var, variance; M, male; F, female.

Table 7.

Direct estimator and synthetic estimator calculation results for current smoking rates by dong

Observation Eup/myeon/dong n Y_d Var_Y_d Corrected cluster Y_s var_Y_s
1 Gaepo 1-dong 51 0.149 0.003 3 0.185 0.000485
2 Gaepo 2-dong 52 0.163 0.004 3 0.188 0.000506
3 Gaepo 4-dong 24 0.155 0.006 3 0.190 0.000511
4 Nonhyeon 1-dong 41 0.383 0.007 1 0.265 0.000828
5 Nonhyeon 2-dong 28 0.444 0.010 1 0.258 0.000742
6 Daechi 1-dong 30 0.186 0.006 2 0.211 0.000528
7 Daechi 2-dong 53 0.070 0.001 2 0.214 0.000541
8 Daechi 4-dong 31 0.357 0.008 2 0.213 0.000595
9 Dogok 1-dong 44 0.185 0.003 2 0.210 0.000548
10 Dogok 2-dong 54 0.075 0.001 2 0.205 0.000506
11 Samsung 1-dong 35 0.212 0.005 1 0.260 0.000694
12 Samsung 2-dong 51 0.241 0.004 1 0.261 0.000737
13 Segok-dong 25 0.108 0.003 3 0.190 0.000522
14 Sooseo-dong 40 0.259 0.005 3 0.187 0.000516
15 Shinsa-dong 36 0.218 0.005 1 0.251 0.000682
16 Apgujeong-dong 36 0.156 0.005 1 0.245 0.000639
17 Yeoksam 1-dong 58 0.365 0.004 2 0.219 0.000717
18 Yeoksam 2-dong 55 0.267 0.005 2 0.211 0.000562
19 Ilwon 1-dong 38 0.218 0.005 3 0.190 0.000513
20 Ilwon 2-dong 44 0.167 0.003 3 0.188 0.000500
21 Ilwonbon-dong 46 0.242 0.004 3 0.184 0.000490
22 Cheongdam-dong 49 0.135 0.002 1 0.256 0.000692

Y_d, direct estimate; Var_Y_d, variance of Y_d; Y_s, synthetic estimate; var_Y_s, variance of Y_s.

Table 8.

Results of estimating the current smoking rates by dong applying four types of composite estimate

Eup/myeon/dong n Y_d V_d Y_c1 V_c1 Y_c2 V_c2 Y_c3 V_c3 Y_c4 V_c4
Nonhyeon 1-dong 41 0.3832 0.0063 0.2787 0.0007 0.2787 0.0007 0.3832 0.0063 0.3310 0.0021
Nonhyeon 1-dong 28 0.4439 0.0096 0.2718 0.0007 0.2802 0.0007 0.4439 0.0096 0.3579 0.0029
Samsung 1-dong 35 0.2118 0.0052 0.2542 0.0006 0.2542 0.0006 0.2119 0.0052 0.2330 0.0018
Samsung 2-dong 51 0.2407 0.0042 0.2584 0.0006 0.2591 0.0006 0.2407 0.0042 0.2495 0.0015
Shinsa-dong 36 0.2180 0.0052 0.2476 0.0006 0.2475 0.0006 0.2180 0.0052 0.2328 0.0018
Apkujeong-dong 36 0.1564 0.0051 0.2352 0.0006 0.2347 0.0006 0.1564 0.0051 0.1958 0.0017
Cheongdam-dong 49 0.1351 0.0024 0.2288 0.0005 0.2421 0.0006 0.1351 0.0024 0.1820 0.0010
Daechi 1-dong 30 0.1862 0.0059 0.2088 0.0005 0.2079 0.0005 0.1862 0.0059 0.1975 0.0018
Daechi 2-dong 53 0.0703 0.0010 0.1642 0.0004 0.1965 0.0004 0.0703 0.0010 0.1173 0.0005
Daechi 4-dong 31 0.3576 0.0077 0.2232 0.0006 0.2303 0.0006 0.3576 0.0077 0.2904 0.0023
Dogok 1-dong 44 0.1855 0.0034 0.2064 0.0005 0.2068 0.0005 0.1855 0.0034 0.1960 0.0012
Dogok 2-dong 54 0.0752 0.0011 0.1650 0.0003 0.1894 0.0004 0.0752 0.0011 0.1201 0.0005
Yeoksam 1-dong 58 0.3655 0.0046 0.2390 0.0006 0.2369 0.0006 0.3655 0.0046 0.3023 0.0016
Yeoksam 2-dong 55 0.2670 0.0053 0.2162 0.0005 0.2175 0.0005 0.2670 0.0053 0.2416 0.0017
Gaepo 1-dong 51 0.1494 0.0030 0.1803 0.0004 0.1816 0.0004 0.1494 0.0030 0.1648 0.0011
Gaepo 2-dong 52 0.1632 0.0041 0.1857 0.0005 0.1859 0.0005 0.1632 0.0041 0.1744 0.0014
Gaepo 4-dong 24 0.1552 0.0069 0.1874 0.0005 0.1863 0.0005 0.1610 0.0048 0.1742 0.0016
Segok-dong 25 0.1082 0.0036 0.1800 0.0005 0.1822 0.0005 0.1083 0.0036 0.1441 0.0012
Soosea-dong 40 0.2596 0.0051 0.1933 0.0005 0.1940 0.0005 0.2596 0.0051 0.2264 0.0016
Ilwon 1-dong 38 0.2184 0.0053 0.1930 0.0005 0.1934 0.0005 0.2184 0.0053 0.2057 0.0017
Ilwon 2-dong 44 0.1670 0.0035 0.1860 0.0004 0.1865 0.0004 0.1670 0.0035 0.1765 0.0012
Ilwonbon-dong 46 0.2416 0.0045 0.1894 0.0004 0.1895 0.0004 0.2416 0.0045 0.2155 0.0015

Y_d, direct estimate; V_d, variance of Y_d; Y_c1, first weight composite estimate; V_c1, variance of Y_c1; Y_c2, second weight composite estimate; V_c2, variance of Y_c2; Y_c3, third weight composite estimate; V_c3, variance of Y_c3; Y_c4, composite estimate using average weight of first and third weights; V_c4, variance of Y_c4.