Researcher (Interpretability)

Amigo

Amigo

Posted on May 13, 2025

Researcher (Interpretability)

To apply, send us your resume and anything else you'd like to careers@amigo.ai
About Amigo
We're helping enterprises build autonomous agents that reliably deliver specialized, complex services—healthcare, legal, and education—with practical precision and human-like judgment. Our mission is to build safe, reliable AI agents that organizations can genuinely depend on. We believe superhuman level agents will become an integral part of our economy over the next decade, and we've developed our own agent architecture to solve the fundamental trust problem in AI. Learn more here.
Role
As a Researcher in Interpretability at Amigo, you'll advance our ability to provide transparent attribution and traceability in agent reasoning processes. Working as part of our Research team, you'll develop techniques to overcome the "token bottleneck" limitation of current LLMs and create systems that can explain precisely how agents derive their responses. Your work will focus on enhancing both contextual source tracing (understanding which messages informed a response) and fine-grained attribution within responses (linking specific parts of output to input sources). This research is critical for building trustworthy AI systems that can be deployed in high-stakes domains.
Responsibilities
Research and develop methods for capturing, analyzing, and visualizing model activations during response generation to enable fine-grained attribution
Create techniques to link specific parts of agent responses to their precise sources in previous messages and knowledge bases
Design and implement enhanced interpretability scenarios for agent reflection, explicit reasoning, and post-session analysis
Develop attribution methods that work across our layered memory architecture (L0, L1, L2) to provide comprehensive traceability
Improve and refine our system for analyzing LLM internals to extract reasoning paths and attention patterns
Collaborate with safety researchers to ensure interpretability mechanisms enhance agent alignment
Bridge current external scaffolding approaches with future neuralese capabilities by developing attribution methods that scale with architectural advances
Publish research findings and contribute to the broader field of AI interpretability
Qualifications
PhD or equivalent research experience in AI, machine learning, or related fields with a focus on interpretability, attribution, or explainability
Strong understanding of transformer architecture internals, activation patterns, and attention mechanisms
Experience with LLM interpretability techniques such as activation analysis, mechanistic interpretability, or attribution methods
Background in working with model internals to extract reasoning patterns and attribution signals
Familiarity with research on the "token bottleneck" problem and approaches to address it
Solid programming skills with experience implementing research prototypes
Excellent research and analytical skills with the ability to design and run rigorous experiments
Strong communication skills for conveying complex technical concepts
Passion for building trustworthy AI that can be deployed responsibly in high-stakes domains
Location: NYC (Onsite)
To apply, send us your resume and anything else you'd like to careers@amigo.ai