Researcher (Reinforcement Learning)

Amigo

Amigo

Posted on May 13, 2025

Researcher (Reinforcement Learning)

To apply, send us your resume and anything else you'd like to careers@amigo.ai
About Amigo
We're helping enterprises build autonomous agents that reliably deliver specialized, complex services—healthcare, legal, and education—with practical precision and human-like judgment. Our mission is to build safe, reliable AI agents that organizations can genuinely depend on. We believe superhuman level agents will become an integral part of our economy over the next decade, and we've developed our own agent architecture to solve the fundamental trust problem in AI. Learn more here.
Role
As a Researcher in Reinforcement Learning at Amigo, you'll develop our evolutionary chamber approach to continuous agent alignment. Working as part of our Research team, you'll create systems that enable agents to evolve under carefully designed pressures that align with organizational goals. Your work will focus on building efficient RL frameworks that optimize the integrated Memory-Knowledge-Reasoning (M-K-R) cycle, selectively targeting high-value capabilities rather than applying RL broadly. This research is essential for establishing the path from baseline capabilities to superhuman performance while maintaining strategic resource efficiency.
Responsibilities
Design and implement reinforcement learning frameworks that enable targeted capability enhancement within the unified M-K-R cycle
Develop techniques for RLVR (Reinforcement Learning with Verifiable Rewards) that use outcome-based rewards verifiable by external oracles or defined success criteria
Create systems for iterative distillation and amplification that systematically improve capabilities through cycles of amplification and distillation
Research methods for self-play reasoning that allow agents to generate their own tasks and learn to solve them
Develop frameworks for continuous alignment that incorporate real-world interaction data and targeted simulations
Design reward functions that align precisely with enterprise-specific metrics and success criteria
Create multi-stage learning approaches that build upon our dynamic multi-step context engine and comprehensive evaluation system
Research techniques for balancing exploration of novel strategies with exploitation of proven approaches
Implement strategic resource allocation methods that enable different confidence requirements for different domains
Collaborate with Simulation and Verification teams to create integrated evolutionary chambers
Contribute to research publications and the broader field of reinforcement learning and AI alignment
Qualifications
PhD or equivalent research experience in reinforcement learning, AI, machine learning, or related fields
Strong understanding of reinforcement learning techniques, particularly in the context of language models
Experience with RLHF (Reinforcement Learning from Human Feedback) or related approaches
Background in designing reward functions, training protocols, or evaluation frameworks
Familiarity with techniques for iterative distillation, self-play, or recursive improvement
Strong programming skills with the ability to implement complex reinforcement learning systems
Understanding of the practical constraints of RL in production environments
Experience with efficient resource allocation in computationally intensive learning systems
Excellent research and analytical skills with the ability to design and run rigorous experiments
Strong communication skills for collaborating across research domains
Passion for building trustworthy AI that can be deployed responsibly in high-stakes domains
Location: NYC (Onsite)
To apply, send us your resume and anything else you'd like to careers@amigo.ai