I am an undergraduate majoring in Computer Science at George Mason University, advised by Prof. Chaowei Yang and Dr. Zifu Wang. My future research direction are pragmatic interpretability and the safety and value alignment of language models (i.e., scalable oversight, deception alignment and super alignment). My current research focuses on evaluating language models on domain-specific tasks, digital twins, and their intersection. You can find my cv here.
My answers to the Hamming question ("What are the most important problems [that you should probably work on]?"):
- How can we rigorously evaluate alignment and safety properties of language models beyond benchmark performance?
- How can interpretability methods be made actionable for safety, rather than purely descriptive?
- How can digital twins serve as testbeds for alignment and safety research?
Feel free to reach out! Email: ymasri[at]gmu[dot]edu or yahya[dot]masri[at]yahoo[dot]com
Blog (Thoughts and Writings)
What is AI Alignment?
AI Allignment is the study and evaluation of a model's behavior remaining consistent with human intentions, ethical norms, and safety constraints
Red-Teaming: This approach generates adversarial scenarios that induce unaligned outputs or actions in AI systems in order to evaluate the robustness of their alignment under intentional pressure. Included human adversarial training and automated adversarial training.
Harmlessness & Misuse Testing:
Truthfulness and Honesty Evaluations:
Instruction-Following and Jailbreak Resistance:
High-Stakes βAgenticβ Scenario Evaluations:
Other Evaluation Approaches and Considerations: Chain-of-thought and rationale inspection,
News
- 2025.12: ποΈ Presented a poster on Building a Computing Infrastructure Digital Twin at AGU25. Poster here.
- 2025.12: π Our paper Automating Data Collection to Support Conflict Analysis was published in Cloud Computing and Data Science.
- 2025.09: π I collaborated with and mentored an ASSIP Fellow during Summer 2025, leading to a successfully published abstract. Abstract here.
- 2025.07: π My first first-author paper Comparative Analysis of BERT and GPT for Classifying Crisis News with Sudan Conflict as an Example was published in MDPI Algorithms.
- 2025.07: π Our work on Optimizing context-based location extraction by tuning open-source LLMs with RAG was published in the International Journal of Digital Earth (IJDE).
- 2025.05: π I graduated from Northern Virginia Community College with an A.S. in Computer Science.
More news
- 2024.12: ποΈ I co-authored a poster on automating map-making via RAG-enhanced geographic information extraction with LLMs presented at the AGU24.
- 2024.11: ποΈ I presented updated findings on context-aware location extraction at the NSF Spatiotemporal Innovation Center (STC) Industry Advisory Board at George Mason University.
- 2024.07: ποΈ I presented research on conflict incident classification using a BERT model at the International Symposium of Spatiotemporal Data Science 2024.
- 2024.05: ποΈ I presented findings on context-aware location extraction at the STC Industry Advisory Board meeting hosted at Harvard University.
Publications
* denotes equal contribution, Ξ± denotes core contributors, and β denotes corresponding author

Yahya Masri*, Anusha Srirenganathan Malarvizhi*, Samir Ahmed*, Tayven Stover*, Zifu Wangβ , Daniel Rothbart, Mathieu Bere, David Wong, Dieter Pfoser, and Chaowei Yang
This paper introduces Data: an open-access, source-attributed Sudan conflict news corpus collected hourly, comprising 6,946 archived articles (as of Oct 25, 2024) from national, regional, and international outlets, with full text and URLs preserved for transparency and verification.

Zifu WangΞ±, Yahya MasriΞ±, Anusha Srirenganathan Malarvizhi, Tayven Stover, Samir Ahmed, David Wong, Yongyao Jiang, Yun Li, Mathieu Bere, Daniel Rothbart, Dieter Pfoser, David Marshall, and Chaowei Yangβ
This paper makes the first systematic investigation of context-based location extraction with open-source LLMs, demonstrating that RAG substantially improves multi-incident, multi-location extraction accuracy.

Yahya MasriΞ±, Zifu WangΞ±, Anusha Srirenganathan Malarvizhi, Samir Ahmed, Tayven Stover, David WS Wong, Yongyao Jiang, Yun Li, Qian Liu, Mathieu Bere, Daniel Rothbart, Dieter Pfoser, and Chaowei Yangβ
We introduce a systematic evaluation framework for Sudan conflict event classification that benchmarks GPT-style LLM prompting against BERT/BERT-large baselines.
Teaching & Mentoring
- ASSIP Fellow mentoring (Summer 2025)
Miscellaneous
- Projects: Glimpse (AI platform that converts product one-liners into cinematic marketing videos), SoccerBot (LLM-powered soccer prediction and analysis app with RAG), BlueTemp (AI platform for predicting sea water temperatures), Crushor (retro platform game)
Educations
- 2025.05 - Present, George Mason University
Undergraduate Student in Computer Science - 2024.02 - Present, Spatiotemporal Innovation Center
Research Assistant
Advisor: Prof. Chaowei Yang at Geography and Geoinformation Science, George Mason University - 2023.08 - 2025.05, Northern Virginia Community College
Undergraduate Student in Computer Science
