AI Agent Evaluation Analyst
hace 5 días
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.
At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.
What we do
The Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe.
Who we're looking for:
We're looking for curious and intellectually proactive contributors, the kind of person who double-checks assumptions and plays devil's advocate.
Are you comfortable with ambiguity and complexity? Does an async, remote, flexible opportunity sound exciting? Would you like to learn how modern AI systems are tested and evaluated?
This is a flexible, project-based opportunity well-suited for:
- Analysts, researchers, or consultants with strong critical thinking skills.
- Students (senior undergrads / grad students) looking for an intellectually interesting gig.
- People open to a part-time and non-permanent opportunity.
About the project:
We're on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you'll have to balance quality assurance, research, and logical problem-solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases.
You do not need a coding background, but you must be curious, intellectually rigorous, and capable of evaluating the soundness and consistency of complex setups. If you've ever excelled in things like consulting, CHGK, Olympiads, case solving, or systems thinking — you might be a great fit.
What you'll be doing:
- Reviewing evaluation tasks and scenarios for logic, completeness, and realism.
- Identifying inconsistencies, missing assumptions, or unclear decision points.
- Helping define clear expected behaviors (gold standards) for AI agents.
- Annotating cause-effect relationships, reasoning paths, and plausible alternatives.
- Thinking through complex systems and policies as a human would to ensure agents are tested properly.
- Working closely with QA, writers, or developers to suggest refinements or edge case coverage.
How to get started:
Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.
Requirements
- Excellent analytical thinking: Can reason about complex systems, scenarios, and logical implications.
- Strong attention to detail: Can spot contradictions, ambiguities, and vague requirements.
- Familiarity with structured data formats: Can read, not necessarily write JSON/YAML.
- Ability to assess scenarios holistically: What's missing, what's unrealistic, what might break?
- Good communication and clear writing (in English) to document your findings.
We also value applicants who have:
- Experience with policy evaluation, logic puzzles, case studies, or structured scenario design.
- Background in consulting, academia, olympiads (e.g. logic/math/informatics), or research.
- Exposure to LLMs, prompt engineering, or AI-generated content.
- Familiarity with QA or test-case thinking (edge cases, failure modes, "what could go wrong").
- Some understanding of how scoring or evaluation works in agent testing (precision, coverage, etc.).
Benefits
- Get paid for your expertise, with rates that can go up to $17/hour depending on your skills, experience, and project needs.
- Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.
- Participate in an advanced AI project and gain valuable experience to enhance your portfolio.
- Influence how future AI models understand and communicate in your field of expertise.
-
AI Engineer
hace 5 días
Buenos Aires, Buenos Aires C.F., Argentina Elevation AI, Inc. A tiempo completo US$180.000 - US$360.000 al añoElevation AI is seeking a hands-on, inventive AI Engineer with deep expertise in agent development to design, build, and scale intelligent, autonomous systems. You'll be at the forefront of shaping how generative AI is applied in production—building agents that combine reasoning, planning, tool integration, and multi-step orchestration to transform...
-
Evaluation Scenario Writer
hace 5 días
Buenos Aires, Buenos Aires C.F., Argentina Mindrift A tiempo completo US$27.200 - US$34.560 al añoThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we...
-
AI Agent Engineer
hace 3 días
Buenos Aires, Buenos Aires C.F., Argentina Rocket Lab A tiempo completo $120.000 - $240.000 al añoAI Agent EngineerAcerca de Rocket LabRocket Lab es un App Growth Hub que impulsa el crecimiento sostenible de aplicaciones móviles mediante estrategias basadas en datos, creatividad y tecnología.Ayudamos a las marcas más innovadoras del mundo a escalar sus apps a través de la adquisición, engagement y retención de usuarios.Ser parte de Rocket Lab...
-
AI Agent Engineer
hace 1 día
Buenos Aires, Buenos Aires C.F., Argentina Rocket Lab | The App Growth Hub A tiempo completo $1.200.000 - $2.400.000 al añoAcerca de Rocket LabRocket Lab es unApp Growth Hubque impulsa el crecimiento sostenible de aplicaciones móviles mediante estrategias basadas en datos, creatividad y tecnología.Ayudamos a las marcas más innovadoras del mundo a escalar sus apps a través de la adquisición, engagement y retención de usuarios.Ser parte de Rocket Lab significa unirte a un...
-
AI Specialist
hace 5 días
Buenos Aires, Buenos Aires C.F., Argentina GoFundMe A tiempo completo $250.000 - $450.000 al añoWant to help us, help others? We're hiring GoFundMe is the world's most powerful community for good, dedicated to helping people help each other. By uniting individuals and nonprofits in one place, GoFundMe makes it easy and safe for people to ask for help and support causes—for themselves and each other. Together, our community has raised more than $40...
-
AI Specialist
hace 3 días
Buenos Aires, Buenos Aires C.F., Argentina GoFundMe A tiempo completo $600.000 - $1.200.000 al añoWant to help us, help others? We're hiring GoFundMe is the world's most powerful community for good, dedicated to helping people help each other. By uniting individuals and nonprofits in one place, GoFundMe makes it easy and safe for people to ask for help and support causes—for themselves and each other. Together, our community has raised more than $40...
-
Freelance AI Agent Trainer
hace 3 días
Buenos Aires, Buenos Aires C.F., Argentina Mindrift A tiempo completo $60.000 - $120.000 al añoThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What...
-
Senior AI Software Engineer
hace 5 días
Buenos Aires, Buenos Aires C.F., Argentina Workana A tiempo completo $600.000 - $1.200.000 al añoWorkana is the largest remote work platform for talents in Latin America. Our new segment, Workana Premium, focuses on matching the most exceptional professionals with leading and innovative companies around the globe. Enjoy competitive compensation, dedicated support, and the flexibility of remote work within a dynamic environment that fosters collaboration...
-
AI Business Analyst
hace 5 días
Buenos Aires, Buenos Aires C.F., Argentina PSI CRO A tiempo completo $60.000 - $120.000 al añoCompany Description We are the company that cares – for our staff, for our clients, for our partners and for the quality of the work we do. A dynamic, global company founded in 1995, we bring together more than 2,800 driven, dedicated and passionate individuals. We work on the frontline of medical science, changing lives, and bringing new medicines to...
-
Agent, MCP
hace 3 días
Buenos Aires, Buenos Aires C.F., Argentina Talan A tiempo completo $60.000 - $120.000 al añoCompany Description Talan – Positive InnovationTalan is an international consulting group specializing in innovation and business transformation through technology. With over 7,200 consultants in 21 countries and a turnover of €850M, we are committed to delivering impactful, future-ready solutions.Talan at a GlanceHeadquartered in Paris and operating...