

Image credit: Wellcome Sanger Institute
Haerin is analysing genetic data from lupus patients to understand why there are such big differences between people affected by the disease. Here, she tells us about her work, eQTLs and cake.
Haerin Jang is a PhD student working on the genetics of Systemic Lupus Erythematosus (known as SLE, or lupus). Lupus is an autoimmune disease affecting 50,000 people in the UK. It can impact many parts of the body, and symptoms can vary from mild to severe, with permanent tissue and organ damage in some cases. The exact causes of lupus are unknown, and complex, though researchers have uncovered many of the genes that play a role.
Haerin is analysing genetic data from lupus patients. She aims to understand why there are such big differences between how different people are affected by the disease. The data and analyses from her work will be open for other researchers, through Open Targets, creating a valuable resource for new advances into understanding the condition.
In this article, Haerin tells us about her work, the importance of eQTLs, and cake.
Hi Haerin, can you tell us about your work into lupus?
Lupus is a very diverse condition. So much so that it’s been really difficult to develop effective drug treatments. My research is focused on understanding that variation. We want to understand why some people develop more severe forms of the disease, such as lupus nephritis, where the kidney is affected, and risk of death is higher. We also want to know why some people respond very well to drugs, but others don't.
We know that a combination of common mutations across many different genes, plus an interaction with environmental factors can cause lupus, which makes it complex. There are a lot of studies on how lupus develops. But these differences between patients - that hasn't been studied as much through genetics.
How are you studying the variation of the disease?
We have genetic data from 300 lupus patients with European, African Caribbean and Asian ancestries, in the form of whole genome sequences. Among the three billion letters of DNA code in someone’s genome, there might be a pattern of changes, or mutations, that have some effect on disease severity.
We also have single-cell sequence data, which tells us which genes are active in a cell and the proteins that are present on the surface of immune cells. In total, I am analysing data from 750,000 individual cells. I’m writing code, developing computational tools and using statistical models to understand and interpret all these data.
From these data, we can stratify the patients into groups with different disease severity, for example, to see how their genetics plays a role. It may not be solvable – the datasets are complex – but we hope to pinpoint some of the regions in the genome that affect cell type specific gene expression, and influence the course of disease.
It’s at a really exciting stage now, because we have all these data – some of which is from the sequencing work that was done by the DNA Pipelines team here at Sanger – and now we are starting to analyse it.

haerin_at_work
Haerin Jang, Wellcome Sanger Institute, analysing genetic data from lupus patients.
What are you looking for in the genome that might affect the disease severity?
We’re interested in ‘expression quantitative trait loci’ or eQTLs. These are DNA mutations that affect the expression of a gene – that is whether it is active in a cell, or not, and to what level. An eQTL might be in a region of the genome that regulates a nearby gene, for example. You could describe an eQTL as a dimmer switch for a light.
There have been a lot of large-scale studies into the genetics of lupus, in terms of the genes involved, so we expect to find eQTLs associated with those genes. But we also hope to find new signals specific to an immune cell type or related to disease severity.
I think with a complex condition like lupus, understanding the genetics of the disease can play a big role. It could allow us, in future, to define disease states in the clinic. It could also lead to more effective drug development, or enable clinicians to select a drug for someone, based on their genetics.
What is the most challenging part of your work?
Because I'm dealing with clinical diversity, harmonising and effectively using the clinical data for analysis is one of the biggest challenges. There’s a lot of clinical data like drug doses, disease severity scores, blood tests, or past symptom histories collected for our study. These are key in understanding the patients and linking them to genetic data. But the records are often challenging to understand, or difficult to format in a uniform way for comparisons. So we actually put in a lot of time into this process. We also work closely with clinicians to make sure we are using the most appropriate data – we want our results to be used at the end of this. Converting clinical data into a format and structuring it in a way that our statistical models can understand, is a big challenge.
Could you describe the Sanger Institute in 10 words?
Genome sequencing at scale, to understand the meaning of life, disease and death.
That’s more than ten! But I think that does it. The scale here is really important. It means we can ask different research questions, the big questions that aren’t possible to answer at other places.
If you could time travel, where would you go?
That’s a really hard question to answer, but I would go to the future if I could. I would like to see how all this genetic research actually feeds into our understanding of disease, and pathology, and how it advances medicine.
I think that now, we’re in an era of genetic research, it's booming. But it takes a while to actually have an impact on people's lives. I’d like to see that.
Is there a word or phrase that is overused in your team?
Well, in our office, we like to celebrate. We like tea and cake. So it’s probably ‘cake anyone?’, which we go up and whisper behind people whenever there is cake – which is regularly!
Also, once a week there is a programme-wide coffee break – it’s a great place to have interesting discussions with people from other groups. Because I’m in both programs, I get to go to both coffee breaks for the Cellular Genetics Programme and the Human Genetics Programme. That's one thing I really like about the Institute - its collaborative nature, and lots of cake.
But science-wise, we’re always thinking about what might be driving the patterns we see in our data, including those that are not biological. So we often say ‘sanity-check’. We need to sanity-check our results to make sure our results make sense and is truly meaningful.
Find out more
- Davenport research group at the Sanger Institute and group website
- Open Targets platform
- Lupus UK website, a source of information and help for lupus patients






