Samin Yeasar Arnob

Latest News

March 2025. “Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts” got accepted in ICLR 2025, Workshop on MCDC! 🎉
- Paper, Poster, Code
- Keywords: Sparse adapter, Parameter-efficient finetuning, Model merging, LLM
- TL;DR: We explore sparse adapters as a simpler and more effective building block for modular, parameter-efficient architectures, demonstrating superior model merging performance at scale.
March 2025. “Sparse-Reg: Improving Sample Complexity of Offline Reinforcement Learning using Sparse Regularization.” got accepted in RLDM 2025! 🎉
- Paper, Code
- Keywords: Offline Reinforcement Learning, Sparsity, Regularization, Sample Complexity, Continuous Control.
- TL;DR: We introduce “Sparse-Reg,” a regularization technique that mitigates overfitting in offline reinforcement learning with small datasets, improving performance in continuous control tasks.
September 2024. “Efficient Reinforcement Learning by Discovering Neural Pathways” got accepted in NeurIPS 2024! 🎉
- Project Page, Paper, Code
- Keywords: Energy Efficient AI, Parameter Efficient, Neural Pathways, Continuous Control, Online Reinforcement Learning, Offline Reinforcement Learning, Multitask Reinforcement Learning.
- TL;DR: To improve energy efficiency and reduce the carbon footprint, we propose Neural Pathway to efficiently use the network parameter space for reinforcement learning.

I am a visiting student-researcher at Microsoft Research, Montreal and doing a Ph.D. in Computer Science at McGill University and Mila Quebec AI Institute working with Dr. Doina Precup.

My research focus is on “Improving the Learning Capacity and Parameter Efficient training for RL and LLMs”. I am working on efficiently using neural networks where we take inspiration from the human brain, using multiple specialized pathways through a single network, with each pathway focusing on a single task. This is an alternate way to “routing” and a mixture of expert structures that can be added to LLM.

I’m interested in the Mixture of experts (MoE), Parameter-efficient finetuning (peft) in LLM, Preference fine-tuning using Reinforcement Learning (RL), LLM alignment, Improving mergability of a mixture of experts.

Prior I did applied research internships at Microsoft Research, New York (summer 2023, host: John Lanford, Alex Lamb), Ubisoft La Forge, Montreal (2021-2022, host: Joshua Romoff), Mila Quebec AI Institute (2019, host: Doina Precup)

I completed Master’s in Electrical and Computer Engineering at McGill University. My master’s research was on “Adversarial Inverse Reinforcement Learning” under the supervision of Dr. Aditya Mahajan at Centre for Intelligent Machine (CIM).

News

March 2024. I’m joined Microsoft Research, Montreal as part-time research intern. I will be working on sparse-adapters for efficient fine-tuning and model merging at scale for LLMs.
May, 2023- Aug, 2023. I worked at Microsoft Research, New York as Research Intern, on an Appiled RL project with John Langford and Alex Lamb.
October, 2021. Two papers got accepted in Offline Reinforcement Learning Workshop, NeurIPS 2021
- “Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning” - Paper
- “Single-Shot Pruning for Offline Reinforcement Learning” - Paper
Sep, 2021- Aug 2022. I have worked at Ubisoft, Montreal with Joshua Romoff as Research Intern.
June, 2020. “Off-Policy Adversarial Inverse Reinforcement Learning” got accepted in Lifelong Learning workshop, ICML 2020. Paper, Code,Talk
January, 2020. I have started my Ph.D. at McGill University.
June, 2019. I joined Mila as Research Intern.
June, 2019. “Doubly Robust Estimators in Off-Policy Actor-Critic Algorithms” got accepted for spotlight presentation at RLDM 2019
January, 2018. I started my Master’s at McGill University.

Research Interest

Reinforcement Learning: Improving LLM, Imitation Learning, Offline Reinforcement Learning, Multitask Learning, Representation learning
Large Language Model: Mixture of experts (MoE), Improving mergability of a mixture of experts, Parameter-efficient finetuning (peft), Preference fine-tuning, LLM alignment