Note: I have tried to write this in as plain English as possible, but please see the terminology / definitions section at the end if there are terms that are unclear.
What on earth are the Dutch Studies and why would I care?
They are two scientific research papers that followed a group of young transgender people in Amsterdam, while they were given the following treatments:
Puberty blockers
Cross-sex hormones
Gender reassignment surgery
They were the first studies of their kind and are widely quoted as evidence for the effectiveness and safety of these treatments in adolescents.
As parents, it is our responsibility to help our child(ren) make informed decisions about these treatments, just like any other medical care. To do that, we need to ask questions like: Is it safe? Is it likely to help my child? What are the risks?
If we are going to answer those questions properly, we need to understand the evidence and be clear on what it does legitimately show and what it does not.
I don’t believe you need a deep knowledge of science to do that and my objective here is to (hopefully) explain this in a way that can be understood by any parent.
This article contains data from the studies themselves, plus additional analysis based on published reviews and on my own thoughts. Please do evaluate this critically – think for yourself about whether what I am saying is valid and, if there are gaps or mistakes, let me know. I’ll be pleased to hear about them, because I am genuinely pursuing truth and accuracy, and nothing else.
Why an Infographic?
I’m keenly aware that most often this type of information is presented as impenetrable pages of text and numbers. To try and make it easier to digest, I’ve created a visual representation that summarises the information:
Click here for high resolution version of the image
The following is additional information to accompany the graphic, providing further detail and discussion.
Individual Summaries
I’ve written up separate summaries of the individual papers, which are intended to be a straight-forward representation of what the authors wrote, without any additional interpretation. Reading them is not at all necessary to understand this piece but they are available here, if you are interested:
Notes on Limitations
1. The study covered a small number of participants, from a single clinic.
The authors note this themselves, saying that “the study sample was small and came from only 1 clinic”. They also published the numbers responding to the various questionnaires:
For example, if I was the parent of biologically male child and was helping them consider the risks and benefits of treatment, we would be basing this on the experiences of around 15 relevant individuals, which is not a huge amount.
The in-depth article commentary produced in 2023 by Abbruzzese, Levine, & Mason raises a number of issues relating to the numbers and the way they were selected. In particular they note that the study started with 111 candidates under the age of 16, which were narrowed down to the 70 that were eventually included.
In short, the objective was to find the most suitable candidates rather than a representative cross section. That is perfectly legitimate for the purposes of the study, but it significantly impacts how applicable to results are to a wider group.
2. The results have not been reproduced in any other group of adolescents.
This is covered in detail by Michael Biggs in his review (Biggs, 2022) and he references a number of attempts to replicate the results:
A study was planned with similar selection conditions and measurement scales at the Tavistock in London, beginning in 2010. The results for the first stage, involving puberty blockers, were eventually published in 2021, but this was not extended to the later stages of treatment (Carmichael, et al., 2021). The results for this first stage did not show the improvement in psychological functioning that was found in the Dutch studies, instead finding there was no significant change.
According to Biggs, the only other directly comparable study is one carried out in Hamburg, which looked at a very small sample of 11 adolescents, who were given puberty blockers for an average of one year (Becker-Hebly, et al., 2021). The authors of that paper state themselves in the abstract that “the findings cannot be generalized to other samples of transgender adolescents”.
There are several other studies cited by Biggs that look at the use of puberty blockers in adolescents but, as he notes, these have small study sizes, use different medications or measures and as such cannot be directly compared to the Dutch results.
A similar conclusion is reached in the review by Abbruzzese, Levine, & Mason (2023), where they say:
It is notable that the only attempt to replicate the 2011 Dutch study results with more than a handful of cases took place in the UK but failed (Carmichael, et al., 2021), with the conclusion of “no changes in psychological function” (p. 1).
Without corroborating results from other groups of adolescents, it is therefore difficult to be sure that the results seen for the few individuals included in the Dutch Studies are representative of transgender youths more generally.
3. Patients with later onset dysphoria or co-existing conditions such as Autism were explicitly excluded, and these results cannot be assumed to apply in a different group.
As the participants were explicitly selected based on having no additional clinical mental health issues, and as having experienced dysphoria since childhood, there is limited validity in applying the conclusions to adolescents who may have other mental health conditions, or who only experienced dysphoria later in life – e.g. when reaching puberty.
This is important to many of the current group of transgender adolescents seeking treatment, as amongst this group there tends to be a high rate of Autism and other mental health conditions, as well as a tendency for gender dysphoria to manifest at a later age.
For example, data published in 2019 showed that 48% of children and young people who were seen by the UK Gender Identity Development Service (GIDS) showed autistic traits (Churcher Clarke & Spiliadis, 2019). The same publication refers to a review of referrals to a Finnish clinic over a 2-year period where 65% presented with adolescent-onset gender dysphoria (defined as age 12 and above).
Abbruzzese, Levine, & Mason (2023) also comment on the differences between the common cases seen at present and those that were included, saying:
Thus, the Dutch protocol explicitly excluded the characteristics of adolescents presenting to clinics in recent years—those whose trans-identities emerged around puberty; non-binary presentations without the wish for a complete cross-sex reassignment; or cases of gender dysphoria accompanied by significant uncontrolled mental illness.
In such cases, where the background of an individual is significantly different to those included in the Dutch Studies, we cannot legitimately assume the same results would be seen.
4. The study concluded soon after surgery and complications or regret may take much longer to surface.
The final questionnaires took place an average of 1.5 years after surgery, which offers a limited window for complications or regret to occur. Biggs characterises this as follows (Biggs, 2022):
One inevitable limitation of the study was the measurement of results soon after surgery, which repeated the problem with the first study of adolescent transsexuals (Cohen-Kettenis & van Goozen, 1997). As Cohen-Kettenis notes, “a truly proper follow-up needs to span a minimum period of 20 years” (Cohen-Kettenis, 2021, pp. 117–118).
There has been a single longer-term follow up on one participant of the Dutch Study - after 22 years, at an age of 35 (Cohen-Kettenis, Schagen, Steensma, de Vries, & Delemarre-van de Waal, 2011).
The individual in question, referred to as “B” did not regret transition but did regret having a further metoidioplasty procedure after the conclusion of the original study. The case study records that he “did not like its size and shape and he could hardly urinate in a standing position. He was able to have orgasms, but he could not have sexual intercourse.”
They also say of B that he considered it likely that some relationship difficulties “had been related to his shame about his genital appearance and his feelings of inadequacy in sexual matters.”
Overall, the authors concluded that “B functioned well in most aspects of life, but that he was still struggling with the question, how to handle the dissatisfaction and shame about his genital appearance”.
More subjectively, it also seems likely that it would only be later in life that regrets might emerge over loss of fertility and the final questionnaire after 1.5 years is unlikely to capture such cases.
5. Patients received a range of care from a multi-disciplinary team, including treatment such as psychotherapy, and this may be responsible for some or all of the observed results.
This is something that the authors highlighted, saying that the evidence should be used “cautiously” and emphasising that it is not only medical intervention that is important, but also:
A “comprehensive multidisciplinary approach” that attends to gender dysphoria and further well-being
A supportive environment
Without a control group, it is impossible to know what effects were a result of the medical treatment given to participants, and what may be a result of the other types of support – for example, the regular sessions with a psychotherapist or psychiatrist.
The authors discuss this, saying that, ideally, the study would have been a randomised, blind trial with a control group, but they also point out that this is challenging to achieve. The reasons given were the difficulty in motivating adolescents to take part and possible ethical concerns over denying access to treatment to those in the control group.
Logically speaking, it is also difficult to see how you could conduct a completely blind trial. Participants would very likely be able to tell from the effects on their bodies whether puberty was blocked or whether cross-sex hormones had been given, and they would certainly know whether or not they received surgery.
Nevertheless, the absence of a control group means we cannot say with certainty that any particular treatment or even all combined are the cause of the observed effects.
Abbruzzese, Levine, & Mason also raised this in their 2023 review, saying:
The finding of modest psychological benefits was compromised by the conflation of medical interventions with psychotherapy, making it impossible to determine whether gender reassignment, therapy, or the psychological maturation that occurs with the passage of time led to these few modest “improvements.”
6. The measured improvement in gender dysphoria may be influenced by switching questionnaires at the final step.
This is discussed in the reviews by Michael Biggs (2022) and by Abbruzzese, Levine, & Mason (2023) but I wanted to look at this for myself and understand why / to what extent it might be an issue.
I sourced a copy of the questionnaires used for the study for the UGDS assessments (Schneider, et al., 2016) and ran my own, imaginary, example case. It’s completely hypothetical, so have a look and judge for yourself as to whether my logic matches your thoughts or not.
I imagined the case of a highly dysphoric biological female, who identifies as trans, and imagined they were filling out the questionnaire at the beginning of the process, before any treatment had occurred. I filled out both of the questionnaires in the way I think this hypothetical person might do, and the below is what I came up with.
As you can see, some of the questions don’t really make sense, and I’ve marked those with a question mark in the relevant column. You might disagree with how I’ve answered them - for example, if you do not have a penis, it is difficult to rate “I dislike having erections” sensibly. I’ve answered “disagree completely” but you could make a case for “neutral” and probably other responses.
Putting these exceptions aside, I think most people would come up for something fairly similar to the above for this imaginary person.
Questionnaire 1 is the set of questions that would be given at the start of the process – the biological female, or “female to male” questions. Questionnaire 2 is the biological male, or “male to female” version, which would have been given to the same individual at the end of the process.
If we score these, we discover that for the first set of questions, we score 60 – the maximum or most dysphoric score, which is what we might expect. However, for the second set of questions, the above answers score 12, which is the minimum or least dysphoric score.
The second score would be slightly higher if the ambiguous questions were, say, marked as “neutral” (in that case it would be 20) but it is still significantly lower than the first set.
This would seem to suggest that, just by switching questionnaires, without carrying out any treatment, we could see a significant drop in the measured gender dysphoria. To my mind, this puts the UGDS scores on pretty shaky ground.
But what about the BIS scores?
The questionnaires used for this scale also have different male and female versions, but there are fewer differences between them. The participant rates satisfaction with various body parts on a 5-point scale. The ratings are grouped into primary, secondary and neutral sex characteristics, as follows (van de Grift, et al., 2016):
Since the categories are largely the same for both questionnaires, there is less opportunity for the results to be affected by the use of different versions.
However, there are still some issues here – for example, what does it mean for a biologically female transman who has had no genital surgery to rate their satisfactions with their penis, scrotum or testicles?
Again, I imagined a highly dysphoric biological female who identified as trans, and thought about how they might answer each of these, right at the start of the process. The answers to the female-oriented one are fairly obvious, I think:
The average score would therefore be 5, as you would expect if you are assuming maximum dysphoria.
But how they might answer the male version is a bit harder to suppose. If they literally assumed penis = clitoris, scrotum = vagina, testicles = uterus, then I suppose they would answer the same as the above, “very dissatisfied” on all counts.
If they thought to themselves, “this doesn’t apply to me, I don’t have those body parts” they would probably answer “neutral on those items, giving this:
If they answered this way, the average score for primary sex characteristics would be 4 – perhaps not such a marked skew as with UGDS, but a reduction of 1 nevertheless, just by using a different questionnaire.
If we look at the separate results for biological males and females in the study:
The overall reduction for primary sex characteristics in females is only 1.29 from start to finish. So, if just using the different questionnaire caused the number to drop by 1, that would be highly significant in these results.
You can do the same for a hypothetical male identifying as a transwoman, and you get similar results. The main difference is that males underwent vaginoplasty, so the terminology in the final BIS questionnaire makes a little more sense, although they would still face the same problems giving an answer on “uterus”.
I also wonder whether the fact that the final questionnaires used more affirming language could possibly influence how satisfied participants reported feeling.
Of course, none of this proves the results in the study are wrong, or that participants didn’t experience an improvement in gender dysphoria – it seems likely they did – but it calls into question the confidence we can have in the measurements that were made.
7. The studies did not investigate any of the following and can therefore provide no meaningful information in these areas:
Physical health, such as bone density, brain development, cardiac health
Reversibility of treatments for patients who discontinue their use
The authors themselves note that there was no assessment of the physical side-effects of treatments. The cases studied were only those who continued on through treatment, and no data was captured on any participants who discontinued treatment (if indeed there were any).
The review by Abbruzzese, Levine, & Mason (2023) also notes this, saying:
The Dutch studies did not evaluate physical health outcomes of “gender-affirmative” treatments, even though adverse effects of hormonal interventions on bone and brain had been hypothesized from the start (and were confirmed by subsequent research).
Given there is no data in these areas, it follows that the studies can offer no meaningful information on these topics.
8. Although excluded from the study, the death that occurred in one patient may have been influenced by the treatment. Specifically, the use of puberty blockers reduces the amount of penile tissue available for vaginoplasty, and increases the chances of requiring surgery using intestinal tissue, which carries higher risks.
This case is also examined in some detail in Michael Biggs’ review (Biggs, 2022), in which he sourced a paper that he says relates to this case (Negenborn, van der Sluis, Meijerink, & Bouman, 2017). Although I have not been able to verify the full content myself, it is clear from the abstract that it refers to a very similar case that occurred at the same medical centre in Amsterdam.
This same scenario – that puberty blockers can have a negative impact on subsequent vaginoplasty – is also mentioned as a serious known issue within the Tavistock clinic in London, in Hannah Barnes’ book about the UK gender identity service (Barnes, 2023, pp. 124-6, 341-2).
Terminology / Definitions
Gender Dysphoria
There are some differences of opinion on this but in short it means negative feelings about parts of your body that are strongly associated with a particular gender, and a resultant desire to be a different gender.
It is included in the DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition) published by the American Psychiatric Association and can be considered a mental disorder on this basis.
Gender Identity Development Service (GIDS)
All care for transgender kids in the UK under the NHS is currently provided by GIDS, the Gender Identity Development Service. At present, this operates mainly from the Tavistock & Portman Trust in London, often referred to as just “The Tavistock”. I say “at present” because a recent independent review (Cass, 2022) led to the recommendation for it to be closed and replaced with regional centres.
Puberty Blockers
These are medications that block the action of key sex hormones in the body and, in adolescent children, have the effect of slowing or stopping the progress of puberty. The medication given is typically a Gonadotropin-Releasing Hormone agonist (GnRHa).
Cross-sex Hormones
Sometimes referred to as “Gender Affirming Hormones” these are hormones that are associated with the opposite biological sex, which are given to produce sexual characteristics that are more consistent with that sex – e.g. breasts, facial hair. Typically, this means synthetic Oestrogen for biological males and synthetic Testosterone for females.
Gender Reassignment Surgery
This refers to surgical procedures carried out to modify the patient’s body to give an appearance that is more typical for the desired gender. This usually includes one or more of the following procedures:
Vaginoplasty – removal of the testicles and inverting the penis to create a neo-vagina
Mastectomy – removal of breasts
Hysterectomy – removal of the uterus (womb)
Ovariectomy – removal of ovaries
Metoidioplasty – construction of a neo-penis using existing genital tissue
Phalloplasty – construction of a neo-penis using tissue taken from elsewhere in the body
Child / Adult Behaviour Checklist (CBCL/ABCL)
Youth / Adult Self Report (YSR/ASR)
These are standard questionnaires are completed by parents (CBCL) and the participant themselves (ABCL/YSR/ASR) which assess a range of general behavioural and emotional problems.
A score of more than 63 is considered to indicate a clinical issue, with higher scores indicating a more significant effect on function. Scores can be separated into “internalizing” and “externalizing” issues, or presented as a combined value (de Vries, et al., 2011, 2014).
Utrecht Gender Dysphoria Scale (UGDS)
This is a scale used to rate gender dysphoria, based on two sets of questions, with participants rating statements on a 5-point scale, from “agree completely” to “disagree completely”. The following is the list of questions used along with the scores for each item:
A higher score indicates a higher level of dysphoria, ranging from 12 to 60.
The original paper defining this scale is available here:
https://doi.org/10.1097/00004583-199702000-00017 (Cohen-Kettenis & van Goozen, 1997)
As this is not accessible without a fee, I have taken the details from a study that references it, here:
This also matches the versions used by the UK Gender Identity Development Service that were provided in this Freedom Of Information request:
https://www.whatdotheyknow.com/request/request_for_paperwork_for_gids_a
Body Image Scale (BIS)
This is a scale used to satisfaction with various parts of the body, with participants rating statements on a 5-point scale, from “very satisfied” to “very dissatisfied”. The scores are grouped into primary, secondary and neutral sex characteristics, as follows:
The results are typically presented as an average score for each of these groups, with a value from 1 to 5.
The original paper defining this scale is available here:
https://doi.org/10.1007/BF01544272 (Lindgren & Pauly, 1975)
As this is not accessible without a fee, I have used a copy that was reproduced here:
https://www.researchgate.net/publication/282944502_Body_Satisfaction_and_Physical_Appearance_in_Gender_Dysphoria/link/5f74bb13458515b7cf5b744a/download (van de Grift, et al., 2016)
This also closely matches the categories used by the UK Gender Identity Development Service that were provided in this Freedom Of Information request:
https://www.whatdotheyknow.com/request/request_for_paperwork_for_gids_a
Children’s Global Assessment Scale (CGAS)
This is a general assessment of mental health in a child, with the clinician assessing impact on the patient’s ability to function in day-to-day life. A higher number indicates better global functioning or a lower level of disturbance, with the following bands defined:
The original paper defining this scale is available here:
https://pubmed.ncbi.nlm.nih.gov/6639293/ (Shaffer, et al., 1983)
As this is not available without a fee, I have used a copy reproduced here:
https://www.researchgate.net/publication/272824387_On_the_Children's_Global_Assessment_Scale (Lundh, 2012)
References
Abbruzzese, E., Levine, S. B., & Mason, J. W. (2023, Jan 2). The Myth of "Reliable Research" in Pediatric Gender Medicine: A critical evaluation of the Dutch Studies — and research that has followed. Journal of Sex & Marital Therapy. Retrieved from https://www.tandfonline.com/doi/full/10.1080/0092623X.2022.2150346
American Psychiatric Association. (2022). Diagnostic and statistical manual of mental disorders (5th ed., text rev.).
Barnes, H. (2023). Time to Think: The Inside Story of the Collapse of the Tavistock’s Gender Service for Children. Swift Press.
Bazelon, E. (2022, June 15). The battle over gender therapy. Retrieved from New York Times: https://www.nytimes.com/2022/06/15/magazine/gender-therapy.html
Becker-Hebly, I., Fahrenkrug, S., Campion, F., Richter-Appelt, H., Schulte-Markwort, M., & Barkmann, C. (2021). Psychosocial health in adolescents and young adults with gender dysphoria before and after gender-affirming medical interventions: A descriptive study from the Hamburg Gender Identity Service. European Child & Adolescent Psychiatry, 30, 1755–1767.
Biggs, M. (2019). A letter to the editor regarding the original article by Costa et al: Psychological support, puberty suppression, and psychosocial functioning in adolescents with gender dysphoria . Journal of Sexual Medicine, 16(12), 2043.
Biggs, M. (2022). The Dutch Protocol for Juvenile Transsexuals: Origins and Evidence. Journal of Sex & Marital Therapy, Advance Online Publication, 1–21.
Carmichael, P. (2016, June 18). Time to reflect: Gender dysphoria in children and adolescents, defining best practice in a fast changing context. World Professional Association for Transgender Health. Retrieved from http://av-media.vu.nl/VUMedia/Play/581e58c338984dafb455c72c56c0bfa31d?catalog=2d190891-4e3f-4936-a4fa-2e9766ae0d0d
Carmichael, P., Butler, G., Masic, U., Cole, T. J., De Stavola, B. L., Davidson, S., . . . Viner, R. (2021). Short-term outcomes of pubertal suppression in a selected cohort of 12 to 15 year old young people with persistent gender dysphoria in the UK. PLoS One, 16(2): e0243894.
Cass, H. (2022). The Cass Review. Retrieved from https://cass.independent-review.uk/
Churcher Clarke, A., & Spiliadis, A. (2019). ‘Taking the lid off the box’: The value of extended clinical assessment for adolescents presenting with gender identity difficulties. Clinical Child Psychology and Psychiatry, 24(2), 338–352.
Cohen-Kettenis, P. (2021). In A. Bakker, The Dutch approach: Fifty years of transgender health care at the VU Amsterdam gender clinic (pp. 117-118).
Cohen-Kettenis, P. T. & van Goozen, S. H. (1997). Sex Reassignment of Adolescent Transsexuals: A Follow-up Study. Child & Adolescent Psychiatry, 36(2), 263-271.
Cohen-Kettenis, P. T., Schagen, S. E., Steensma, T. D., de Vries, A. L., & Delemarre-van de Waal, H. A. (2011). Puberty suppression in a gender-dysphoric adolescent: A 22-year follow-up. Archives of Sexual Behavior, 40, 843–847.
Costa, R., Dunsford, M., Skagerberg, E. H., Carmichael, P., & Colizzi, M. (2015). Psychological Support, Puberty Suppression, and Psychosocial Functioning in Adolescents with Gender Dysphoria. The Journal of Sexual Medicine, 12(11), 2206–2214.
De Vries, A. L., McGuire, J. K., Steensma, T. D., Wagenaar, E. C., Doreleijers, T. A., & & Cohen-Kettenis, P. T. (2014). Young adult psychological outcome after puberty suppression and gender reassignment. Pediatrics, 134(4), 696-704.
De Vries, A. L., Steensma, T. D., Doreleijers, T. A., & Cohen‐Kettenis, P. T. (2011). Puberty suppression in adolescents with gender identity disorder: A prospective follow‐up study. The Journal of Sexual Medicine, 8(8), 2276-2283.
Lindgren, T. W., & Pauly, I. B. (1975). A body image scale for evaluating transsexuals. Archives of Sexual Behavior, 4, 639–656.
Lundh, A. (2012). On the Children’s Global Assessment Scale. PhD Thesis, Karolinska Institutet.
Negenborn, V. L., van der Sluis, W. B., Meijerink, W. J., & Bouman, M.-B. (2017). Lethal necrotizing cellulitis caused by ESBL-producing E. coli after laparoscopic intestinal vaginoplasty. Journal of Pediatric and Adolescent Gynecology, 30, e19–e21.
Schneider, C., Cerwenka, S., Nieder, T. O., Briken, P., Cohen-Kettenis, P., Cuypere, G., . . . Richter-Appelt, H. (2016). Measuring gender dysphoria: A multicenter examination and comparison of the Utrecht gender dysphoria scale and the gender identity/gender dysphoria questionnaire for adolescents and adults. Archives of Sexual Behavior, 45, 551–558.
Shaffer, D., S., G. M., Brasic, J., Ambrosini, P., Fisher, P., Bird, H., & Aluwahlia, S. (1983). A children's global assessment scale (CGAS). Arch Gen Psychiatry, 40(11), 1228-31.
Tavistock and Portman NHS Foundation Trust. (2021, April 30). Freedom of Information Request: Request for paperwork for GIDs assessment and treatment clinic. Retrieved from What Do They Know: https://www.whatdotheyknow.com/request/request_for_paperwork_for_gids_a
van de Grift, T. C., Cohen-Kettenis, P. T., Steensma, T. D., De Cuypere, G., Richter-Appelt, H., Hebold Haraldsen, I. R., . . . Kreukels, B. (2016). Body Satisfaction and Physical Appearance in Gender Dysphoria. Archives of Sexual Behavior, 45, 575–585.
Updates / Version History
26th April 2023
Updated to include information from Abbruzzese, E., Levine, S. B. & Mason, J. W., The Myth of "Reliable Research" in Pediatric Gender Medicine: A critical evaluation of the Dutch Studies — and research that has followed, 2023.
Thank you!
Some Dutch study outcomes were presented at wpath last fall and are reported here:
https://pubmed.ncbi.nlm.nih.gov/36593754/
Abbruzzese et Al, 2023, The Myth of "Reliable Research" in Pediatric Gender Medicine: A critical evaluation of the Dutch Studies-and research that has followed