Trialling diagnosis-related groups classification in the Iranian health system: a case study examining the feasibility of introducing casemix

S. Ghaffari,^1,2 C. Doran,¹ A. Wilson¹ and C. Aisbett³

اختبار تصنيفي لمجموعات مرتبطة بالتشخيص في النظام الصحي الإيراني: دراسة حالة جدوى إدخال مزيج الحالات

شهرام غفاري، كريستفر دوران، آندرو ويلسون، كريستفر آيزبت

الخلاصـة: تدرس هذه المقالة جودة المعلومات التي تجمع بشكل روتيني في إحدى المستشفيات الإيرانية في تجربة لتصنيف مزيج الحالات. وقد استخدم الباحثون أداة المجموعات المرتبطة بالتشخيص والمعدلة في أستـراليا لتصنيف النوبات المرضية لدى المرضى. وتعرف الباحثون على 327 مجموعة مرتبطة بالتشخيص، منها %20 لديها حالة واحدة، كما تعرَّف برنامج تحديد المجموعات إلى سجلات غير صحيحة لدى %4 من مجمل السجلات المنفصلة. وقد صنّف الباحثون %4.5 من الحالات تقريباً ضمن المجموعات الخاطئة المرتبطة بالتشخيص، فيما كان %3.4 من الحالات غير قابلة للتصنيف في مجموعات. ولم يتعرفوا على أية مضاعفات أو مراضة مرافقة لدى %93 من مجمل الحالات. وبلغت قيمة R2 (التفاوت في فتـرة المكث التي يمكن تفسيرها) %44 في الحالات غير المشذَّبه، وقد زادت لتصل إلى %63 عند تشذيبها بطريقة L3H3، وبمقدار %57 عند تهذيبها بطريقة QR1، وعند تهذيبها بالشريحة المئوية العاشرة إلى 95.

ABSTRACT This paper examines the quality of routinely collected information in an Iranian hospital in a trial of casemix classification. Australian Refined Diagnosis Related Groups (AR-DRG) were used to classify patient episodes. There were 327 DRGs identified, of which 20% had only 1 case. The grouper program identified invalid records for 4% of total separations. Approximately 4.5% of cases were classified into error DRGs and 3.4% were ungroupable. No complication and comorbidity effects were identified with 93% of total cases. R2 (variance in length of stay explained) was 44% for untrimmed cases, increasing to 63%, 57% and 58% after trimming by L3H3, IQR and 10th–95th percentile methods respectively.

Essai de classification par Groupes Homogènes de Malades dans le système de santé iranien : étude de cas analysant la faisabilité de l’introduction du « case-mix »

RÉSUMÉ Cet article étudie la qualité des informations recueillies de manière systématique dans un hôpital iranien dans le cadre d’un essai de classification par « case-mix ». La version australienne affinée des Groupes Homogènes de Malades (GHM) a été utilisée pour classer les épisodes cliniques des patients. Au total, 327 GHM ont été déterminés, parmi lesquels 20 % n’incluaient qu’un seul cas. Le programme qui effectue le groupage a identifié des enregistrements non valides pour environ 4 % de l’ensemble des séparations. Environ 4,5 % des cas étaient classés comme erreur GHM et 3,4 % comme ingroupables. Aucune conséquence liée à des complications ou à une comorbidité n’a été identifiée dans 93 % des cas. R2 (variation de la durée de séjour expliquée) était de 44 % pour les cas non classés, et passait à 63 %, 57 % et 58 % après classement à l’aide des méthodes L3H3, IQR (écart interquartile) et du 10e au 95e percentile, respectivement.

¹School of Population Health, University of Queensland, Brisbane, Australia (Correspondence to S. Ghaffari: This e-mail address is being protected from spambots. You need JavaScript enabled to view it )

²Social Security Organization, Tehran, Islamic Republic of Iran.

³Laeta Pty Ltd, Randwick, New South Wales, Australia.

Received: 26/11/07; accepted: 25/05/08

EMHJ, 2010, 16(5), 460-466

Introduction

The Islamic Republic of Iran is preparing itself to implement the casemix budgeting system, commonly used in many developed countries, in its hospitals. Casemix uses the diagnosis-related group (DRG) system to classify acute inpatients and was initially designed for quality assurance but is now extensively used for funding purposes. DRG is a system for linking the acute inpatients that a hospital treats to the costs incurred by the hospital [1]. Episodes of care are classified into the different DRGs according to the principal and significant secondary diagnoses, main surgical procedure, types of separation (patient discharge, death or transfer), birth weight, age and sex of patient [2].

The Australian refined DRG (AR-DRG) is a non-proprietary and well-documented system used in Germany, Ireland, New Zealand, Singapore, Slovenia [3], and to some extent in some of our neighbouring countries such as Turkey and Saudi Arabia. Clinicians’ involvement in the development of AR-DRG, and its regular updating as clinical practice changes, distinguishes it from other systems.

Despite the significant improvement achieved in some areas over past decades, the Islamic Republic of Iran as a low middle-income country still has unresolved “inefficiencies and inequalities” in its health care system [4]. Total health expenditure per capita was estimated at US$ 498, which is 6.5% of the total gross domestic product, in 2003 [5]. The country has 1.7 beds per 1000 population, with an occupancy rate of 56% in state-owned hospitals. Hospitals are funded based on annual budgeting in which inefficiency is a major problem, attributable to poor managerial systems. According to the Iranian National Health Account, hospitals consume approximately 36% of total annual health expenditure in the country [4]. However, the true level of expenditure is estimated by health managers to be higher than this.

This paper studies the feasibility of applying the AR-DRG classification in hospitals run by the Iranian Social Security Organization. It examines the adequacy and quality of routinely collected hospital information and identifies problems associated with DRG classification and recommends required improvements. The study hospital was comparable to other larger Iranian hospital in terms of information systems.

Methods

Study setting

Patients’ demographic and clinical information for the year 2003–04 were obtained from Kashani hospital, a well established hospital in Tehran, Islamic Republic of Iran. It has 126 beds in use and provides both outpatient and inpatient services. Inpatient services are provided by 12 wards, including surgical (general, eye, orthopaedic, urology and ear, note and throat) and internal medicine, paediatrics, maternity, coronary care and intensive care. In 2003–04, there were 437 738 outpatient and 11 674 inpatient occasions of services reported in this hospital (i.e. number of admissions or number of admitted patients).

Coding

While procedures are coded by physicians in the Iranian system, clinical coders are responsible for assigning diseases to the appropriate International Classification of Disease (ICD) codes. The ICD 10th revision, Australian modification (ICD-10-AM) and the ICD 9th revision, clinical modification (ICD-9-CM) codes are used for coding diseases and procedures respectively. A mapping algorithm was used to map Iranian procedures into the Australian version. The code mapping was similar to that done for the evaluation of AR-DRG for Irish hospitals and was further refined by the Australian experience in changing from ICD-9-CM to ICD-10-AM, in changes to versions of AR-DRG and in the use of National Centre for Classification in Health maps between edition of ICD-10-AM [6].

The collected data were input into a grouper devised by Laeta, a specialist health information company. The grouper is a computer-based software program that assigns patient episodes into DRG classes and assesses the quality and adequacy of the documentation system through identifying invalid or missing data, including patient age, sex, length of stay, principal and secondary diagnosis codes, procedure codes, etc. [7].

The AR-DRG system uses 4 alpha-numeric characters and classifies patient episodes into 665 DRGs and 23 major diagnostic categories (MDC) and error DRGs by sequential steps as follows [1]: demographic and clinical edits; assignment of MDC using principle diagnosis; pre-MDC processing which includes records for very high cost case-types; partitioning of MDC in which patients are classified into medical, surgical or other partitions; assignment of adjacent-DRG which classifies patients based on the resource consumption level; assignment of the complications and comorbidity level and patient clinical complexity level; and finally assignment of DRG. Cases that have very high and variable cost or cases that cannot be classified into any MDC based on principal diagnosis are grouped into the pre-MDC class.

Coefficient of variation

The coefficient of variation, which is the standard deviation divided by the mean, often multiplied by 100 to give a percentage [8], was used to measure the variation in length of stay for individuals within each DRG [9]. A coefficient of variation less than 100 reflects acceptable within-group homogeneity [9] and measures the meaningfulness of the classification system. Reduction in variance (R2) was used to measure the extent to which the dispersion of length of stay could be explained by this grouping. R2 is an overall measure of how well patients are classified into acceptable groups on the basis of resource consumption [10] and how well the classification system performs in our setting. Values of R2 range from 0 (no reduction) to 1 (perfect match). Stata, version 9.2 was used to calculate the coefficient of variation and R2. R2 was computed as follows [10]:

FORMULA

where yi is the value of the variable (i.e. length of stay) for the ith patient, A is the average value for the variable in the database and Ag is the average value of the variable in DRG g. The square of the difference between the actual (yi) and the predicted value (A or Ag) is a measure of the variation in the data.

Trimming

Trimming, which is a method of excluding outliers (unusual length of stay or cost), was applied to approximate a normal distribution. We used 3 different trimming methods to identify outlier cases:

In the L3H3 method, the low- and high-stay trim-points for every DRG equal the average length of stay for the DRG divided and multiplied by 3 respectively [11].

In the interquartile range (IQR) method, low and high trim-points are calculated as: Q1–1.5 (Q3–Q1) and 1.5 (Q3–Q1) + Q3 respectively, where Q1 and Q3 refer to the 1st and 3rd quartiles of the distribution [12].

In the 10th and 95th percentile method, the outliers are identified by the 10th percentile where at least 90% of patients would have a length of stay greater than or equal to that point [13]. The point at which 95% of patients have a length of stay less than or equal to it—the 95th percentile—indicates the high trim-point [14].

The patients who fall between low and high trim-points are known as inliers. The percentage of outlier cases was used as an indicator to measure the appropriateness of the classification algorithms and trimming methods in our setting.

Results

The main findings of classifying 11 674 inpatient occasions of service are presented in this paper. Further details are available from the corresponding author on request.

Quality of hospital records

Table 1 provides an overview of the valid and invalid or missing information identified by the grouper. The code 00 shows a normal grouping condition (96.6%) and all other codes show some sort of problem identified by the grouper, including missing or invalid principal diagnosis (code 01), invalid age (code 04), invalid sex (code 05), invalid length of stay (code 08) and invalid same day separation (code 09).

MDC assignment

The highest volume MDC, which encompassed 19% of total hospital separations, was diseases and disorders of the digestive system (MDC 06). Only a few cases fell into MDC 17 (neoplastic diseases), MDC 19 (mental diseases), MDC 20 (alcohol/drug use and alcohol/drug-induced organic mental disorders) and MDC 23 (factors influencing health status and other contacts with health services). There were no cases in MDC 22 (burns).

Pre-MDC processing and MDC partitioning

DRG A06Z (tracheostomy or ventilation > 95 hours) was the only DRG identified during pre-MDC processing. There were no cases of liver, lung, heart or renal transplant at this hospital. Almost 54% of total separations were classified as surgical and 46% of them were classified into the medical partition. Only 18 cases fell into the “other” partition (cases that had no operating room procedure but had at least 1 non-operating room procedure).

Complication and comorbidity level and assignment of patient clinical complexity level

The majority of the separations in the study hospital (93%) were assigned a value of 0, which means that diagnosis codes for the specific separation were not identified as complication and comorbidity codes or, if they were, they were closely related to the principal diagnosis. The remaining 7% of the patient records, with a varying degree of clinical complexity, were classified into patients’ clinical complexity levels 1 to 4 (Table 2). Almost 63% of total separations (7355) were discharged with a single diagnosis code and only 7% (818) of them were discharged with 3 or more diagnosis codes.

DRG assignment

There were 327 DRGs identified in the study, 20% of which had only 1 case and 47% had less than 5 cases. DRG C16A, lens procedure (8%), was the highest volume DRG identified in this study. About 3.4% of the hospital separations fell into DRG 960Z as “ungroupable”.

Error DRGs

Approximately 4.5% of separations were classified into the error DRGs. DRG 960Z, which contains records with invalid or missed principal diagnosis or other invalid essential information such as age, sex or admission weight-for-age < 1 year, comprised 78% of the total error DRGs. DRG 901Z and 902Z (13%) include all procedures with codes irrelevant to the principal diagnosis. DRG 961Z and 963Z (8%) include patient records with unacceptable principal diagnoses.

Length of stay

The range of length of stay varied from 1 to 60 days, and the highest proportion of patients (34%) separated on the same day or the day after admission. Length of stay was not recorded for about 2% of the total separations. Average length of stay for untrimmed cases was 3.09 and for trimmed cases were 3.06, 2.92 and 2.93 days respectively using the L3H3, IQR and percentile methods. Approximately 5.2%, 5.5% and 4.2% respectively of total separations were identified as outliers after trimming by L3H3, IQR, and 10th–95th percentile methods.

DRG X07B (skin graft for injuries) had the highest average length of stay across all DRGs, at 23 days. Excluding DRGs with less than 5 cases, DRG G03C (stomach, oesophageal and duodenal procedure without malignancy) had the highest average length of stay (12.4 days). The highest volume DRG, C16A (lens procedures), had an average length of stay of 1.8 days. In general, MDC 05 with 6.70 and MDC 14 with 1.62 had the highest and lowest average length of stay, respectively. The average lengths of stay were 4.6, 4.0 and 2.3 days for the other, medical and surgical partitions, respectively.

Table 3 shows a summary distribution of DRGs with a coefficient of variance < 100 and variance in length of stay explained (R2) for untrimmed and trimmed data. It shows that within-group homogeneity increased from 90% for untrimmed data to 99% for trimmed data by the L3H3 method. The results show that the value of R2 (variance in length of stay explained), which was 44% for untrimmed data, increased to 63%, 57% and 58% after trimming by L3H3, IQR and 10th–95th percentile methods respectively. As our objective was not a comprehensive evaluation of the performance of the AR-DRG system, and our sample comprised only 1 hospital, we did not go into a detailed analysis to evaluate R2 for DRGs at MDC level.

Discussion

Effective implementation of any casemix classification system requires accurate and thorough recording and coding of patients’ demographic, clinical and financial information. Although this information is usually available in a hospital’s discharge system, the quality of information and its availability through a computerized system are problematic in low-resource countries. Despite a national movement toward using ICD-10 codes in the Islamic Republic of Iran, many hospitals still do not apply ICD-10 or apply it partially. Surgical procedures in the study hospital were coded using ICD-9-CM. Although ICD codes were not primarily designed for DRG and casemix purposes, they are considered to be the “basic ingredient of casemix recipe” [15].

Applying a coding system compatible with the version of DRG which is to be employed for classification purposes is an important step toward a successful DRG classification trial. In this exploratory study, we used AR-DRG which requires ICD-10-AM coding (the Australian version of ICD-10). Mapping Iranian hospital data from ICD-9 to ICD-10 and ICD-10-AM would require additional effort and technical expertise which is probably not available across the country.

Mode of separation, which is a compulsory variable for completing DRG classification [16], was recorded in the hospital but not included in the software language of the grouper we employed. The complexity of the mapping process for medical record codes arose from the inconsistency in the Iranian coding, but this was a modest problem as we reviewed only 1 hospital. Mapping tries to facilitate grouping and is usually not a serious problem when the separation codes are used consistently. However, in the long term, it would be better to avoid it through upgrading the documentation system and/or choosing an appropriate version of DRG.

The accuracy of DRG assignment depends on the quality of data, which, in turn, ha a direct impact on the usefulness of the information produced by the casemix system, whether for management or funding purposes [17]. In the Islamic Republic of Iran, procedures are recorded by physicians and matching the procedure and disease codes with appropriate ICD codes is the responsibility of coders. Poor coding practice, including choosing the right principal diagnosis and recording all secondary diagnoses and main procedures, was the main shortcoming identified during this study. While the grouper has the ability to use up to 30 diagnosis and procedure codes, no more than 4 secondary diagnoses and 3 procedures were recorded at the study hospital. Secondary diagnoses and main procedures reflect the severity of the illnesses and are essential for correct grouping. Accuracy and completeness in documentation and proficiency of morbidity coders are essential for achieving a meaningful grouping [2]. Although general practitioners are employed to control coding accuracy, there is no standard quality control to secure the accuracy and consistency of coding either at the physician or coder level and quality of coding is always questionable. There are still some coders in Iranian hospitals who have not been formally trained for coding clinical records.

The ICD coding of patient clinical information was 1 year behind. Information was stored in out-of-date programming languages such as DOS and FoxPro. It is difficult to prepare data from these sources for commonly used contemporary data management software compatible with the grouper program (such as Microsoft Excel or Access). Neither qualified nor experienced staff was available to upgrade the systems to fully match the study requirements.

Inaccurate and low quality data results in error DRGs. We identified 6 error DRGs in the AR-DRG including 901Z, 902Z, 903Z, 960Z, 961Z, 963Z, which all contained invalid or atypical information [18]. Errors in DRG classification occur either because of poor quality data due to invalid principal diagnosis, missing codes or data entry inaccuracies.

Other problems identified by the grouper arose due to invalid data entry. Not all invalid information affects the grouping quality in the same way. For example, the grouper identified 137 instances of “invalid age” and 133 of “invalid sex”, which affected grouping of 136 and 14 cases respectively. There were also 246 invalid length of stay and 109 invalid principal diagnoses that affected quality. Invalid information can be classified with a “warning” and “fatal” flag, depending on the size of the effect. A fatal flag identifies problems leading to error DRGs 960Z, 961Z, 963Z, for instance, conflict between principal diagnosis and sex for the obstetric DRGs [1]. Dealing with factors leading to error DRGs is critical for high quality classification. Training and education are important in reducing coding errors [19] and are central to casemix implementation when the majority of staff has either no or very limited knowledge of casemix [20].

Admission weight and age in days for patients with age < 1 year, which are essential for a DRG classification, were not collected in the system. The Laeta grouper default and international admission weight growth curve data were used to classify such patients into these DRGs, but demographic differences between nations suggest the need for the development of a country-specific classification system. “Leave days”, which is an optional but useful field in identifying length of stay outside the acute setting, and a “same day flag” needs to be directly recorded. Identifying the source of admission—emergency or elective admission—also provides useful information for utilization review and quality assurance programmes [21]. Electively admitted patients have a higher chance of getting standard care than emergency ones. There is no specific code in the study hospital to indicate patients’ admission status.

MDC and DRG provide worthwhile information about hospital activity which is useful for policy-making. Information provided by MDC is even more worthwhile where the number of separations falling into the DRG groups is small. The volume of the cases classified into each DRG or MDC group is directly affected either by the poor quality of data or choosing an incompatible version of DRG. Too many DRG classes mean that there are too few observations within each class, which in turn makes it hard to understand actual variations between hospitals. On the other hand, too few classes mean that there are too many heterogeneous cases within each class. In this case, the possibility of placing large number of dissimilar cases into 1 group will cause a difficulty in finding real variation between doctors, nurses and hospital output [2]. In our single-hospital study 50% of the DRGs identified had too few observations, i.e. fewer than 5 observations within each DRG. Inaccurate cost weight could be the main problem arising due to low-volume DRGs [9], and a larger sample would be needed to estimate cost weights.

Approximately 5% of total separations were identified as outliers after trimming by L3H3 and IQR, and 4% as outliers after trimming by the 10th–95th percentile method. It is normally accepted that an outlier proportion of more than 10% is too high, reflecting either inappropriate classification algorithms or trimming problems [22]. The overall high R2 values of 0.63, 0.57 and 0.58, after trimming using L3H3, IQR and 10th–95th percentile respectively, and the high proportion of DRGs with coefficient of variance less than 100 (more than 90%), suggest that AR-DRG provide good explanatory power and within-group homogeneity in this hospital. The value of R2 (0.63 trimmed by L3H3) was comparable to that reported by other studies [12,23]. However, this requires replication in other Iranian hospitals due to the small sample size, the large proportion of DRGs with only 1 case (20%) and the low quality of the data.

Conclusion

Our study shows that DRG classification is achievable in this hospital run by the Iranian Social Security Organization using routinely collected information. However, to achieve a classification system to inform funding and management decisions, the following changes would need to occur: upgrading of the current computerized system including documentation systems at admission and discharge points; using ICD-10 for principal diagnoses across the hospital system; recording age in days and admission weight for patients < 1 year old; flagging same-day separations and recording admission type (emergency versus elective); and employing experienced, proficient and skilled coders.

The problems identified during the data editing and classification processes for this trial are likely to be relevant to other Iranian hospitals and other countries with similar hospital documentation infrastructure when planning and implementing casemix for either management or funding purposes. Although the result of R2 was acceptable by the usual benchmarks, studies with larger data sets and different classification systems are recommended if measurement of DRG performance is desired.

Acknowledgements

The authors gratefully acknowledge the valuable assistance of the staff at the Iranian Social Security Organisation and Kashani hospital, particularly that of Dr Amir Abbas Manochehri, Majid Hasanian, Ali Mohammdai Sanjar and Aboulfazl Taheri in collecting the required information. We also acknowledge the support of the Iranian Social Security Organization for giving permission to publish these data.

References

Australian refined diagnosis related groups: version 5.1: definitions manual. Canberra, Commonwealth Department of Health and Ageing, 2004.
Eagar K, Hindle D. Casemix in Australia: an overview. Canberra, Department of Human Services and Health, 1994.
Hindle D. Implementing DRGs in Slovenia: why the Australian variation was selected. Australian health review, 2003, 26:11.
Iran national health accounts. Tehran, World Bank, 2004 (http://www.who.int/nha/docs/en/Iran_NHA_report_english.pdf, accessed 29 November 2009).
Core health indicators database, 2007. World Health Organization Statistical Information System [online database] (http://apps.who.int/whosis/database/core/core_select.cfm, accessed, 29 November 2009).
Aisbett CW et al. Measuring hospital case mix: evaluation of alternative approaches for the Irish hospital system. Dublin, Ireland, Economic and Social Research Institute, 2007 (Paper WP192)
National hospital cost data collection: cost report, round 7 (2002–2003). Canberra, Commonwealth Department of Health and Ageing, 2004.
Bland M. An introduction to medical statistics. New York, Oxford University Press, 2000.
Reid B, Palmer G, Aisbett C. The performance of Australian DRGs. Australian health review, 2000, 23:20–31.
Averill R et al. The evolution of casemix measurement using diagnosis related groups (DRGs). Studies in health technology and informatics, 1994, 14:75–83.
Duckett SJ. Casemix funding for acute hospital inpatient services in Australia. eMJA, 1998, 19:s17–21.
Gong Z et al. Describing Chinese hospital activity with diagnosis related groups (DRGs): a case study in Chengdu. Health policy, 2004, 69:93–100.
Public hospital cost benchmarks 2004–05. Technical paper. Brisbane, Queensland Health, 2004.
Cole S, Stomfay B. R.I.P L3H3? In: Proceedings of the 15th Casemix Conference in Australia. Sydney, Commonwealth Department of Health and Aged Care, 2004.
Roberts RF, Innes KC, Walker SM. Introducing ICD-0-AM in Australian Hospitals. Medical journal of Australia, 1998, 169:S32–5.
Aisbett C. Access grouper: a user friendly implementation of AR-DRG, version 5.0. Sydney, Laeta, 2002.
Reid B, Palmer GR, Aisbett C. Under-coding in Australia limits the performance of DRG groupers. Health information management, 1999/2000, 29:113–7.
Australian refined diagnosis related groups, version 5.0. Definitions manual. Canberra, Commonwealth Department of Health and Ageing, 2002.
Hay PJ, Pearce T. Casemix funding in psychiatry: Some problems and common pitfalls. Australian health review, 1996, 19:125–33.
Ghaffari S, Doran C, Wilson A. Casemix in the Islamic Republic of Iran: current knowledge and attitudes of health care staff. Eastern Mediterranean health journal, 2008, 14(4):931–40.
Lichtig LK. Hospital information systems for casemix management. New York, Wiley, 1986.
Palmer G, Reid B. Evaluation of the performance of diagnosis-related groups and similar casemix systems: methodological issue. Health services management research, 2001, 14:71–82.
Jackson T. ANDRG3 and ARDRG4: how do they compare on resource homogeneity? In: Proceedings of the Annual Conference of Patient Classification System/Europe. Groningen, Netherlands, Patient Classification System/Europe, 2000.