Eastern Mediterranean Health Journal | Past issues | Volume 12, 2006 | Volume 12, supplement 2 | Validation of the Arabic Strengths and Difficulties Questionnaire and the Development and Well-Being Assessment

Validation of the Arabic Strengths and Difficulties Questionnaire and the Development and Well-Being Assessment


A. Alyahri1 and R. Goodman1

ABSTRACT We examined the validity of the Arabic versions of 2 main measures of child psychopathology: the Strengths and Difficulties Questionnaire (SDQ) and the Development and Well-Being Assessment (DAWBA). They were administered to the parents and teachers of 2 samples of 5–12-year-old Yemeni children, one from psychiatric clinics (n = 87) and the other from the community (n = 100). The SDQ scores distinguished well between the 2 samples and also between children with different psychiatric diagnoses. The DAWBA showed substantial agreement with independent clinic diagnosis. The brevity of the SDQ and the respondent-based nature of the DAWBA interview make these tools feasible for use in countries where there is a severe shortage of skilled manpower.

Validation de la version arabe des questionnaires Strengths and Difficulties Questionnaire et Development and Well-Being Assessment

RÉSUMÉ Nous avons examiné la validité de la version arabe de 2 instruments de mesure de la psychopathologie infantile : Strengths and Difficulties Questionnaire (SDQ) et Development and Well-Being Assessment (DAWBA). Ces instruments ont été utilisés avec les parents et les enseignants de 2 échantillons d’enfants yéménites âgés de 5 à 12 ans, l’un issu des consultations psychiatriques (n = 87) et l’autre de la communauté (n = 100). Les scores au SDQ établissaient une nette distinction entre les 2 échantillons et aussi entre les enfants ayant des diagnostics psychiatriques différents. Le DAWBA montrait une grande concordance avec le diagnostic clinique indépendant. Le SDQ est un questionnaire bref et l’entretien DAWBA est de type structuré, ce qui rend ces instruments utilisables dans les pays où il y a une grave pénurie de personnel qualifié.

1Department of Child and Adolescent Psychiatry, Institute of Psychiatry, King’s College, London, United Kingdom (Correspondence to A. Alyahri: alyahri32hotmail.com).
EMHJ, 2006, 12(Supplement 2): 138-146


There is growing recognition of the importance of child psychiatric disorders in developing countries. These disorders are important not only because they result in suffering for children and those around them, but also because they interfere with social and educational development, and can lead to life-long social and psychiatric problems [1]. There is a pressing need, particularly in developing countries with very limited access to child mental health professionals, to develop simple screening mechanisms to help ensure that referrals to child mental health services are appropriate. It would be unrealistic to develop screening mechanisms that depend on complex or expensive measures and have to be administered by highly trained staff [2].

The Strengths and Difficulties Questionnaire (SDQ) is a brief tool that could be a relatively cheap and easy screening measure. It was originally published in English [3], and has subsequently been translated into over 60 languages. The fact that the SDQ is predictive of psychiatric diagnoses in many developed and developing countries [4] raises the possibility that it might be useful as a screen for psychiatric disorders in community settings, primary health care or paediatric clinics in Yemen.

The Development and Well-Being Assessment (DAWBA) is a package of questionnaires and structured interviews that also collects answers to open-ended questions for clinical review. In developing countries, where there is a severe shortage of skilled manpower, use of structured interviews is the most cost-effective and feasible way of carrying out community surveys [5]. The DAWBA was initially designed for a nationwide epidemiological survey of common emotional and behavioural disorders in Britain [6]. Previous studies have provided evidence for the validity of the DAWBA in a developed country [6] as well as in a developing country [7].

Translating established measures is generally quicker and cheaper than developing new measures for each new language or country, and international measures have the additional advantage of facilitating international comparison. It is important, however, that standardized measures developed outside a particular culture and language community should be validated before their use in that setting [5].

This study aimed to validate the Arabic versions of standardized child psychiatric measures (the SDQ and the DAWBA) for future clinical and research application in Yemen and elsewhere in Arabic speaking countries.


Clinical sample

The clinical sample was obtained from Aden Neuropsychiatric Teaching Hospital, the main psychiatric hospital in Yemen, Alwahda Paediatric Teaching Hospital, the biggest paediatric hospital in Yemen, and from school-based psychiatric clinics in Aden. These hospitals receive referrals from a wide catchment area, mainly from Aden, Abian and Lahj provinces. The children referred to these clinics are of mixed socioeconomic status. The SDQ and the DAWBA were administered to parents. The interviewers were blind to the diagnosis made by the psychiatrist, the paediatrician or the psychologist. The SDQ was read out by the interviewer when the respondents’ literacy skills were insufficient for them to complete the questionnaires directly. The SDQ and the DAWBA were administered to teachers as well. These measures were administered on a consecutive series of 108 patients aged 5–12 years when first seen at the clinics between February and July 2002. Eleven were excluded because missing answers made it impossible to generate all scores. The final sample consisted of the remaining 97 children. Teacher SDQs and DAWBAs were available only for 68 (70%), either because the child was not at school or because the parents did not consent to contact with the school.

The mean age of the children in the sample was 9.4 (standard deviation 3.1) years and 53 (54.6%) were male.

Community sample

The community sample was selected from schools in 2 different areas, Crater and Sheikh Othman, of Aden city. These areas were chosen to represent families whose socioeconomic status was similar in range to those who made up the clinic sample. One hundred children between 5 and 12 years were selected through a 2-stage sampling programme. In the first stage, a clustered random sample of classes from the first 4 grades of primary school was selected from each school and in the second stage, children were randomly selected, 3 or 4 from each class, from the class registers of 24 classes in 3 boys’ schools and 3 girls’ schools; the rest were selected from 1 kindergarten.

Parents were visited at home or seen in schools and all agreed to take part in the study. Parents and teachers completed SDQs, with the questionnaire being read out when the respondent did not have adequate literacy skills. Complete parent and teacher SDQ information was available on all 100 children. The mean age of the sample was 8.9 (standard deviation 1.6) years and 51% were males. The community and clinic samples were matched for sex (χ² = 0.63, df = 1, P = 0.4) and did not differ significantly for age (P = 0.09; 95% confidence interval –0.99–0.07)

Assessments were carried out between January 2002 and April 2002, avoiding the first term when teachers do not yet know their pupils, and avoiding the end of the last term when teachers are often busy with the end-of-year examinations.

Inclusion/exclusion criteria

Clinical sample

Children between the age of 5 and 12 years who had been referred to the psychiatric clinic were included. All consecutive referrals were eligible except for children with moderate or severe learning disability or whose only problem was enuresis, epilepsy or a specific learning disorder.

Community sample

All children between 5 and 12 years of age attending the kindergarten and the first 4 grades of the 7 selected schools were eligible.


The SDQ is a brief behavioural screening questionnaire that covers 25 attributes, some positive and others negative [3]. The 25 items are divided between 5 scales of 5 items each, generating scores for conduct problems, inattention-hyperactivity, emotional symptoms, peer problems and prosocial behaviour. All scales but the last are summed to generate a total difficulties score (range 0–40). The same questionnaire can be completed by the parents or the teachers of 4–16-year-olds [3]. Besides covering common areas of emotional and behavioural difficulties, it also inquires whether the informant thinks that the child has a problem in these areas and, if so, asks about resultant distress and social impairment [8]. The web site at http://www.sdqinfo.com provides more information and downloadable questionnaires in many languages and scoring instructions.

The SDQ had previously been translated into Arabic. This translation was revised to maximize understanding in Yemen, and back-translated to ensure fidelity to the original English version.

The DAWBA is a package of questionnaires, interviews and rating techniques designed to generate ICD-10 and DSM-IV psychiatric diagnoses on 5–16-year-olds [9,10]. It is designed so that non-clinical interviewers can administer a structured interview to parents about psychiatric symptoms and resultant impact. When definite symptoms are identified by the structured questions, interviewers use open-ended questions and supplementary prompts to get parents to describe the problems in their own words. These descriptions are transcribed verbatim by the interviewers but are not rated by them. Teachers complete a brief questionnaire covering the main conduct, emotional, and hyperactivity symptoms and any resultant impairment. The different sorts of information are brought together by a computer program that also predicts likely diagnoses. These computer-generated summary sheets and diagnoses form a convenient starting point for experienced clinical evaluators, who decide whether to accept or overturn the computer diagnosis (or lack of diagnosis) in the light of their review of all the data, including transcripts [6].

The DAWBA was translated from English to Arabic by the first author. To ensure translation equivalence, a back-translation was done in the Faculty of Linguistics and Translation, Ajman University, United Arab Emirates. Conceptual and linguistic problems were resolved by extensive consultation with the second author, who wrote the original version, and a number of Arab psychiatrists. The translated version was piloted on a number of mothers from various Arab countries at the Islamic Welfare Centre in London. The web site at http://www.dawba.com provides more information, and downloadable versions of the interviews and questionnaires in many languages.

Selection and training of interviewers

Four female interviewers (participation rate was expected to be higher with females) were selected, 3 were experienced clinical psychologists and 1 was a psychiatric senior house officer. The mode of introducing the DAWBA and the SDQ was explained in detail. Training on the theoretical aspects was carried out for 1 week then role-play exercises were carried out followed by a field-training session in the clinic.

Clinical diagnosis

Children from the psychiatric and the paediatric clinics were assigned clinical diagnoses based on the operationalized criteria of either ICD-10 or DSM-IV. These clinical diagnoses were made at the time of initial assessment by a psychiatrist, a clinical psychologist or a paediatrician. The diagnoses were made blind to the children’s SDQ and DAWBA assessments.

Diagnoses were collapsed into 2 broad categories to provide cell sizes that would be sufficient for meaningful analysis and also to avoid misclassification of subcategory diagnoses, as the initial assessments were carried out by a variety of professionals at different levels of seniority. The categories were externalizing disorder (including hyperkinetic, conduct disorder and oppositional disorder) and emotional disorder (including anxiety, depressive disorder and obsessive compulsive disorder). Overall, 30 children had an emotional disorder and 79 had an externalizing disorder (with 12 individuals having both).

Development and Well-Being Assessment diagnosis

All open-ended comments were translated into English by one of the authors. Cultural and linguistic nuances in each case were discussed by the authors, after which the DAWBA diagnosis was made by the second author, who had previously made or supervised many thousands of DAWBA diagnoses. The rating consisted of reviewing information from the structured and open-ended questions and re-evaluating the computer diagnosis to provide a final clinical diagnosis based on DSM-IV and ICD-10 diagnostic criteria for each subject [9,10]. Diagnoses were collapsed into 2 broad categories to match the diagnostic categories adopted for the clinical diagnosis, sufficient for meaningful analysis.

Statistical analysis

Validity of the Development and Well-Being Assessment

Based on criteria originally proposed by Landis and Koch [11], the validity of the translated version was primarily tested by examining the correspondence between clinical diagnosis and DAWBA diagnosis, using the kappa coefficient (κ, external validity). If there is complete agreement, then κ = 1. If there is no more agreement than would be expected by chance alone, then κ = 0.

Validity of the Strengths and Difficulties Questionnaire

The ability of different SDQ scales to distinguish between community and clinic subjects was examined using receiver operating characteristic (ROC) curves, employing the area under the curve as the index of discriminant ability. For this purpose, the underlying assumption was that the children in the clinic sample were substantially more likely to have psychiatric disorders than were the children in the community sample (i.e. the relevant psychiatric disorders were more common in the high-risk than in the low-risk group). In ROC analyses, sensitivity (percentage of correctly identified “cases”) and specificity (percentage of correctly classified healthy “non-cases”) are calculated for all possible cut-off points of a score, and then combined in a single value called “area under the curve” (AUC). The AUC value obtained in this way reflects the discriminant validity. As a guide to interpretation, the AUC is 1.0 for a measure that discriminates perfectly and 0.5 for a measure that has no better than chance accuracy. With the number of subjects in this study, the level of significance is significantly better than chance when the AUC is ≥ 0.6.

To generate ROC curves for each SDQ scale, the community sample was compared with the most relevant operationalized categorical diagnoses derived from the DAWBA. For 4 of the scales, the total difficulties scale, the total impact scale, the peer problems scale and the prosocial behaviour scale, the comparison was between all those in the community group and all those in the clinic group. The remaining 3 scales, emotional, conduct and hyperactivity symptoms, were judged by comparing the entire community sample with those clinic cases who had the corresponding disorder, as diagnosed by the DAWBA. For example, the discriminant power of the SDQ emotional scale was judged by comparing all community subjects with those children who had been diagnosed by the DAWBA as having an emotional disorder.


Validity of the Arabic Development and Well-Being Assessment

DAWBA diagnosis was done on 86 (89%) of the 97 clinical cases. Table 1 shows the cross-tabulation of DAWBA and clinic diagnoses for emotional disorders. The DAWBA diagnosed the same disorder in 83% of cases.

Table 2 shows the cross-tabulation of DAWBA and clinic diagnoses for externalizing disorders (conduct/hyperkinesis). The DAWBA agreed with clinic diagnoses in 86% of cases. Overall there was substantial agreement between the DAWBA and the clinic diagnoses on the 2 main diagnostic groupings: emotional disorders and externalizing disorders.

Validity of the Arabic Strengths and Difficulties Questionnaire

Table 3 summarizes the ability of different SDQ scales and informants (parents and teachers) to distinguish between community and clinic subjects, as gauged by the AUC. All 7 SDQ scales (total impact, total difficulties, emotional symptoms, conduct problems, hyperactivity, peer problems and prosocial behaviour) seem potentially useful for predictive purposes. In each case, the AUC was significantly greater than 0.5 (P < 0.001). Discrimination was best for hyperactivity and conduct scales for both the parent SDQ (AUC = 0.97 for hyperactivity and 0.88 for conduct scale) and the teacher SDQ (AUC = 0.97 for hyperactivity and 0.86 for conduct scale).

In the clinic sample, discrimination for emotional, conduct and hyperactivity scores between patients with different sorts of disorders (as defined by the DAWBA) was also examined using the AUC (Table 4). For example, the SDQ hyperactivity score discriminated well between patients with hyperkinetic disorder and psychiatric controls, i.e. clinic patients without a hyperkinetic disorder but with other diagnoses instead. Similarly, conduct and emotional scores all discriminated satisfactorily between clinic cases with and without the corresponding type of disorders. All AUCs represented a level of prediction substantially better than chance (P < 0.001).

It is clear that both the parent SDQ and the teacher SDQ were as good at discriminating between different types of disorder within the clinic sample as they were at distinguishing between the clinic and the community sample.


The main purpose of this study was to investigate the validity of the Arabic versions of 2 main measures of child psychopathology, namely the SDQ and the DAWBA.

The DAWBA worked well judged by its agreement with independent clinical diagnosis. This study provided the first evidence for the validity of the Arabic version of the DAWBA. Furthermore, these results support findings from previous studies from other developing countries on the validity of DAWBA parent interview. For example, in a Brazilian report, DAWBA made a diagnosis on 94% of a clinical sample who had an independent clinical diagnosis, with agreement on diagnostic grouping for 78% [12]. In a study in Bangladesh, there was substantial agreement between the DAWBA and the independent clinic diagnosis (κ 0.63–0.94) [13].

Our study examined validity rather than reliability. Of course, the evidence for validity provides indirect evidence for reliability too; an unreliable set of measures would also have done poorly on tests of validity [6].

The interviewers were encouraged to express their opinion and the families’ feelings and attitudes about this relatively long interview as this was the first time a structured psychiatric interview had been used in Yemen and because of the intention to use this measure subsequently in a large-scale epidemiological study. In general, interviewers reported very encouraging attitudes from the families, and the interviewers found it an enjoyable experience for themselves as well because they felt they had acquired more knowledge about child psychiatric problems. The skip rules also made the interview shorter and easy to administer.

Using the SDQ, it was possible to discriminate well, with a fair degree of precision, between community subjects and clinic patients on the basis of all 7 SDQ scales. Within the clinic sample as well, the SDQs were able to predict broad-band psychiatric diagnoses with a fair degree of accuracy. This level of accuracy could potentially be clinically useful. For example, children whose parent and teacher SDQ scores suggest that they are at a particularly high risk of a hyperkinetic disorder could be allocated to a professional with particular expertise in this domain.

These results support findings from 2 previous studies conducted on the SDQ in Arabic countries. The first was conducted in the Gaza strip on children in 4 age bands [14]. In spite of the small sample size in each age group, the findings indicated that the SDQ was very promising as a screening measure or rating scale. More recently, in Yemen, Almaqrami and Shuwail found that the self-report version of the SDQ discriminated appropriately between a clinic and a community sample and was capable of detecting childhood emotional and behavioural disorders in clinical settings [15].


The present study suggests that the Arabic version of the SDQ may predict psychiatric diagnosis accurately enough to be of value for screening and epidemiological studies as well as for clinical assessment, and shows that the SDQ is not only a practical and economical, but also valid measure for assessing different behavioural aspects of children.

The DAWBA shows a substantial agreement with independent clinical diagnoses. Such measures are essential to assess child psychiatric diagnoses in large-scale surveys to determine the prevalence of mental disorders in children in Yemen or other Arabic countries. The DAWBA may also be useful for standardized assessment within child mental health services.


This study was partially supported by the Eastern Mediterranean Regional Office of the World Health Organization. The authors are grateful to all parents, teachers and interviewers involved.


