Nemo developed a Generation Interface that facilitates the integration of other inference engines. But in NeMo RL’s current structure, the generation modules concentrate Ray orchestration, worker glue, sampling, and refit hooks into a few large files, so new backends must mirror a lot of logic instead of plugging into a smaller abstraction. And some vLLM-specific branches leak into distributed/worker_groups.py and policy/. Introducing a BaseGenerationBackend plus a simple backend registry would centralize common behaviors and make adding engines like SGLang much easier.