publications | Yuan Pu

Please check my Google Scholar for the latest list. * indicates equal contribution.

2026

MedRedFlag: Investigating how LLMs Redirect Misconceptions in Real-World Health Communication

Sraavya Sambara^*, Yuan Pu^*, Ayman Ali, and 3 more authors

2026

URL

2025

Ethylene Glycol Monomethyl Ether Altered Rat Sperm Small RNAs with Critical Developmental Roles

Yuan Pu^*, August Guang^*, Xinran Qi, and 3 more authors

bioRxiv, 2025

Abs DOI

Ethylene glycol monomethyl ether (EGME) is a testicular germ cell toxicant that selectively targets spermatocytes. In rats, male-only EGME exposure reduces mating success and can lead to an increase in resorbed fetuses. In a previous study, five-day exposure to 50, 60, or 75 mg/kg/d EGME in male rats led to a decrease in sperm motility and increase in retained spermatid heads with a LOAEL of 75 mg/kg/d. At 60 mg/kg/d, EGME exposure altered the proportion of sperm small RNA reads mapped to different small RNA categories and the distribution of read lengths. Because there is evidence that small non-coding RNAs (sncRNAs) in sperm regulate embryonic development, we analyzed sperm sncRNA data from EGME-treated male rats to identify differential expression at the individual RNA level. EGME treatment resulted in dose-dependent increases in the expression levels of microRNAs (miRNAs), piRNAs, and tRNA-derived small RNAs (tsRNAs). We identified 12 miRNAs that were differentially expressed at all EGME doses, with a monotonic, dose-dependent increase. High-confidence targets of these 12 miRNAs are known to be expressed in pre-implantation embryos and statistically enriched for Gene Ontology (GO) biological processes related to early development, such as cell fate commitment and regulation of developmental growth. These results demonstrated that the EGME-induced changes in sperm sncRNA levels were reproducible, dose-dependent, and provided a putative mechanism of paternal EGME effects on embryonic development, which will be investigated in future studies.Competing Interest StatementThe authors have declared no competing interest.National Institute of General Medical Sciences, https://ror.org/04q48ey07, P20 GM109035, P20 GM156712Brown University, https://ror.org/05gq02987
Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM

Furong Jia^*, Yuan Pu^*, Finn Guo, and 1 more author

ML4H 2025 (Findings), 2025

Abs URL

Large language models (LLMs) excel on multiple-choice clinical diagnosis benchmarks, yet it is unclear how much of this performance reflects underlying probabilistic reasoning. We study this through questions from MedQA, where the task is to select the most likely diagnosis. We introduce the Frequency-Based Probabilistic Ranker (FBPR), a lightweight method that scores options with a smoothed Naive Bayes over concept-diagnosis co-occurrence statistics from a large corpus. When co-occurrence statistics were sourced from the pretraining corpora for OLMo and Llama, FBPR achieves comparable performance to the corresponding LLMs pretrained on that same corpus. Direct LLM inference and FBPR largely get different questions correct, with an overlap only slightly above random chance, indicating complementary strengths of each method. These findings highlight the continued value of explicit probabilistic baselines: they provide a meaningful performance reference point and a complementary signal for potential hybridization. While the performance of LLMs seems to be driven by a mechanism other than simple frequency aggregation, we show that an approach similar to the historically grounded, low-complexity expert systems still accounts for a substantial portion of benchmark performance.
Machine learning on multiple epigenetic features reveals H3K27Ac as a driver of gene expression prediction across patients with glioblastoma

Yusuke Suita, Hardy Bright, Yuan Pu, and 4 more authors

PLOS Computational Biology, Aug 2025

Abs DOI

Epigenetic mechanisms play a crucial role in driving transcript expression and shaping the phenotypic plasticity of glioblastoma stem cells (GSCs), contributing to tumor heterogeneity and therapeutic resistance. These mechanisms dynamically regulate the expression of key oncogenic and stemness-associated genes, enabling GSCs to adapt to environmental cues and evade targeted therapies. Importantly, epigenetic reprogramming allows GSCs to transition between cellular states, including therapy-resistant mesenchymal-like phenotypes, underscoring the need for epigenetic-targeting strategies to disrupt these adaptive processes. Understanding these epigenetic drivers of gene expression provides a foundation for novel therapeutic interventions aimed at eradicating GSCs and improving glioblastoma outcomes. Using machine learning (ML), we employ cross-patient prediction of transcript expression in GSCs by combining epigenetic features from various sources, including ATAC-seq, CTCF ChIP-seq, RNAPII ChIP-seq, H3K27Ac ChIP-seq, and RNA-seq. We investigate different ML and deep learning (DL) models for this task and ultimately build our final pipeline using XGBoost. The model trained on one patient generalizes to other 11 patients with high performance. Notably, H3K27Ac alone from a single patient is sufficient to predict gene expression in all 11 patients. Furthermore, the distribution of H3K27Ac peaks across the genomes of all patients is remarkably similar. These findings suggest that GSCs share a common distributional pattern of enhancer activity characterized by H3K27Ac, which can be utilized to predict gene expression in GSCs across patients. In summary, while GSCs are known for their transcriptomic and phenotypic heterogeneity, we propose that they share a common epigenetic pattern of enhancer activation that defines their underlying transcriptomic expression pattern. This pattern can predict gene expression across patient samples, providing valuable insights into the biology of GSCs.
Usability and adoption in a randomized trial of GutGPT a GenAI tool for gastrointestinal bleeding

Sunny Chung, Mauro Giuffrè, Niroop Rajashekar, and 8 more authors

npj Digital Medicine, 2025

Abs DOI

Generative AI (GenAI) may enhance clinical decision support systems (CDSS), but its impact on adoption remains unclear. We conducted a simulation-based randomized trial to evaluate whether a GenAI-enhanced CDSS, “GutGPT,” improves adoption compared to an AI dashboard in acute upper gastrointestinal bleeding management. Clinical trainees were randomized to either GutGPT or a comparator dashboard across three cases. The primary outcome was Behavioral Intention, from the Unified Theory of Acceptance and Use of Technology (UTAUT). Secondary measures included additional constructs and decision accuracy. A total of 106 participants participated (52 GutGPT, 54 comparator). GutGPT users reported higher Effort Expectancy. Behavioral Intention had no significant difference. Qualitative analysis highlighted trust and workflow concerns. These findings suggest that usability alone is insufficient to drive adoption. As this study was conducted in a simulation without real-world integration or patient outcomes, further studies are needed. (Trial Registration: ClinicalTrials.gov; Identifier: NCT05816473; Registered March 6, 2023).
Recent Advances, Applications and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2024 Symposium

Amin Adibi, Xu Cao, Zongliang Ji, and 39 more authors

CoRR, Feb 2025

Abs URL

The fourth Machine Learning for Health (ML4H) symposium was held in person on December 15th and 16th, 2024, in the traditional, ancestral, and unceded territories of the Musqueam, Squamish, and Tsleil-Waututh Nations in Vancouver, British Columbia, Canada. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the ML4H community. The organization of the research roundtables at the conference involved 13 senior and 27 junior chairs across 13 tables. Each roundtable session included an invited senior chair (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with an interest in the session’s topic.

2024

Trajectory flow matching with applications to clinical time series modeling

Xi Zhang^*, Yuan Pu^*, Yuki Kawamura, and 4 more authors

NeurIPS 2024, 2024

Abs DOI

Modeling stochastic and irregularly sampled time series is a challenging problem found in a wide range of applications, especially in medicine. Neural stochastic differential equations (Neural SDEs) are an attractive modeling technique for this problem, which parameterize the drift and diffusion terms of an SDE with neural networks. However, current algorithms for training Neural SDEs require backpropagation through the SDE dynamics, greatly limiting their scalability and stability. To address this, we propose Trajectory Flow Matching (TFM), which trains a Neural SDE in a simulation-free manner, bypassing backpropagation through the dynamics. TFM leverages the flow matching technique from generative modeling to model time series. In this work we first establish necessary conditions for TFM to learn time series data. Next, we present a reparameterization trick which improves training stability. Finally, we adapt TFM to the clinical time series setting, demonstrating improved performance on four clinical time series datasets both in terms of absolute performance and uncertainty prediction, a crucial parameter in this setting.
Human-Algorithmic Interaction Using a Large Language Model-Augmented Artificial Intelligence Clinical Decision Support System

Niroop Channa Rajashekar^*, Yeo Eun Shin^*, Yuan Pu^*, and 13 more authors

CHI 2024, 2024

Abs DOI

Integration of artificial intelligence (AI) into clinical decision support systems (CDSS) poses a socio-technological challenge that is impacted by usability, trust, and human-computer interaction (HCI). AI-CDSS interventions have shown limited benefit in clinical outcomes, which may be due to insufficient understanding of how health-care providers interact with AI systems. Large language models (LLMs) have the potential to enhance AI-CDSS, but haven’t been studied in either simulated or real-world clinical scenarios. We present findings from a randomized controlled trial deploying AI-CDSS for the management of upper gastrointestinal bleeding (UGIB) with and without an LLM interface within realistic clinical simulations for physician and medical student participants. We find evidence that LLM augmentation improves ease-of-use, that LLM-generated responses with citations improve trust, and HCI varies based on clinical expertise. Qualitative themes from interviews suggest the perception of LLM-augmented AI-CDSS as a team-member used to confirm initial clinical intuitions and help evaluate borderline decisions.

2023

Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk

Colleen Chan, Kisung You, Sunny Chung, and 10 more authors

ML4H 2023 (Findings), 2023

Abs URL

Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electronic health record (EHR) with emergency medicine physicians, internal medicine physicians, and medical students to evaluate its effect on physician acceptance and trust in AI clinical decision support systems (AI-CDSS). GutGPT provides risk predictions from a validated machine learning model and evidence-based answers by querying extracted clinical guidelines. Participants were randomized to GutGPT and an interactive dashboard, or the interactive dashboard and a search engine. Surveys and educational assessments taken before and after measured technology acceptance and content mastery. Preliminary results showed mixed effects on acceptance after using GutGPT compared to the dashboard or search engine but appeared to improve content mastery based on simulation performance. Overall, this study demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented optimally and paired with interactive interfaces.