This work introduces a model-agnostic framework to interpret closed-API LLMs. We train an efficient, standalone interpreter to provide sentence-level importance scores, quantifying prompt influence on generated text without further API calls.
Learn MoreExploring new methods to understand and debug Retrieval-Augmented Generation (RAG) models using concept-based explanations.
See Details (Coming Soon)Developed and trained a sequence-to-sequence model in PyTorch for generating simple, valid Python programs from natural language prompts.
View on GitHub (Coming Soon)Sharif University of Technology
Currently working on my Master's Thesis on "Interpretability in Generative Models: Investigating the Mechanisms Behind Output Generation in Large Language Models."
Exploring Neurosymbolic AI and LLM reasoning and their integration with psychology for future AI research directions.