Functions of ARID3a in multiple hematopoietic cell types
Abstract
Since the completion of the Human Genome Project in 2003, scientists have had a map of all the base pairs that make up human DNA. Beginning in 2008, RNA-seq was developed that allowed scientists to directly sequence messenger RNA that is transcribed by genes in cells. In 2011, computing power and cost of sequencing (1) was just beginning to reach a point to allow for high-throughput sequencing experiments that could define a person’s genome and transcriptome and identify potential disease-causing mutations, having important implications for personalized medicine. One year later, the capability to interrogate the transcriptome of a single cell became possible with the introduction of single-cell RNA-seq (scRNA-seq) (2). Previous bulk studies represent an ensemble of transcriptomes from many cells and therefore could mask important differences. Studies have shown that individual cells have remarkable heterogeneity, even when isolated from a seemingly homogeneous populations (3). Although single-cell technology exists, the need for experts that can analyze and interpret the large and high-dimensional data that is produced from next-generation sequencing experiments is high. For example, the number of scRNA-seq datasets uploaded to the Gene Expression Omnibus (GEO) has increased from 15 in 2012 to over 15,000 in 2020 (4). Therefore, my primary objective for graduate school was to develop a solid background in bioinformatics and to apply next-generation sequencing and microfluidic technology to questions in human disease.
To meet this objective, I have used multiple biologic systems to garner expertise in bioinformatics by exploring functions of the transcription factor, ARID3a. ARID3a is an understudied protein, with unknown functions, that is expressed in hematopoietic cells during development, and in various types of adult stem cells (5). It is a member of a large family of epigenetic regulators and dimerization of ARID3a is necessary for binding to DNA in a sequence specific fashion (6,7). The few studies that investigate the function of ARID3a indicate that it has functions in both activating and repressing gene expression. The first role of ARID3a in activating expression came from studies in B cells isolated from mice. These studies revealed that ARID3a is required for proper immunoglobulin heavy chain expression through binding the intronic heavy chain enhancer (8-10). Later studies would reveal that ARID3a expression is developmentally restricted and that it binds directly to promoter/enhancer regions of the pluripotency master regulators OCT4, SOX2, and NANOG to contribute to their repression in mouse embryonic fibroblasts and stem cells (11,12). However, the mechanism by which ARID3a contributes to activation or repression of gene expression remains elusive. Therefore, my goals were to use sequencing technology to investigate 1) the importance of ARID3a in biologically important systems (i.e. lupus, hematopoietic cells, erythropoiesis) and 2) provide insights into the functions of ARID3a using different techniques (i.e. RNA-seq, ATAC-seq, single-cell RNA-seq).
To meet these goals, I have used RNA-seq to show that ARID3a expression is associated with disease activity in two cell types, plasmacytoid dendritic cells (pDCs) and low density neutrophils (LDNs) which are key players in inflammatory pathways observed in patients with systemic lupus erythematosus (SLE) (13). The RNA-seq data I generated also show that ARID3a is lowly expressed in pDCs and LDNs isolated from both SLE patients and healthy controls. Therefore, it was not possible to perform typical differential expression analysis based on ARID3a+ and ARID3a- samples. This finding also explains why other investigators using traditional RNA-seq analysis have not identified ARID3a to be important to disease activity in these cell types. Instead, I performed unsupervised hierarchical clustering to show that the RNA-seq data cluster based on the levels of ARID3a protein and that SLE disease activity scores strongly correlate with ARID3a protein (13). I also show through correlation analysis that ARID3a functions epigenetically through repressing and activating many genes.
Previous work revealed that ARID3a deletion in mice resulted in embryonic lethal phenotype but the rare survivors had a significant reduction in erythrocytes and B cells (14). Using the early hematopoietic cell line, K562, which can be stimulated with hemin to induce erythrocyte differentiation, I show with RNA-seq that ARID3a protein is necessary for fetal globin expression and erythrocyte development. My data identify genes affected by ARID3a expression and indicate that ARID3a functions in a cell type specific fashion. The Assay for Transposase-Accessible Chromatin (ATAC)-seq data I generated using K562 cells show for the first time that ARID3a alters chromatin accessibility of enhancer regions essential for the induction of erythroid-specific genes. These findings allowed me to learn bulk RNA-seq and ATAC-seq analyses and led to the investigation of ARID3a in primary human cells using single cell RNA-seq.
Single cell RNA-seq was performed on hematopoietic stem cells (HSCs) from aged and young donors and reveal that ARID3a levels have an impact on B cell fate decisions. This work show that single cell technology can be used to identify changes based on ARID3a transcript in HSCs.
Finally, to accomplish my goal of applying the technologies described above to human disease, I have employed microfluidic technology to capture single naïve B lymphocytes from SLE patients. Healthy naïve B cells do not express ARID3a. However, ARID3a is present in ~50% of the naïve B cells of SLE patients. Additionally, since ARID3a is an intracellular protein, it is not possible to isolate ARID3a+ B lymphocytes without affecting RNA integrity. Therefore, I have employed microfluidic technology to capture single ARID3a-expressing and ARID3a negative naive B lymphocytes from SLE patients. Briefly, B lymphocytes were isolated using known surface markers representing naïve B cell subsets and were then captured on a Fluidigm C1 Single-Cell Auto Prep system. At present, there are significant knowledge gaps, which will be detailed in Chapter 2, regarding how the expression of ARID3a contributes to increased disease activity and what causes the increased inflammatory responses in SLE patients. We hypothesize that ARID3a expression will identify autoreactive naïve B cells that have broken tolerance and/or are part of the inflammatory responses observed in SLE.
Collections
- OU - Dissertations [9327]