Building responsible AI, we can trust

Riddhi Joshi
Nov 13
3 min read

Do you still tell AI what to do? Well not for long. Now is the era of autonomous agents. They think, decide, and act on their own.

The company 1X, has opened pre-orders for Neo (a robot designed to work autonomously, helping with everyday chores, so totally autonomous.) That’s having an AI agent operating 24/7, right there with you; thinking, deciding, taking actions, handling all tasks on your behalf. Sounds cool. But it may also raise ethical questions. Who’ll be accountable when an agent makes a mistake? The company, the developer, or the user?

Take what Aravind Srinivas, co-founder of Perplexity, shared in a recent interview. He asked perplexity to "Respond to an email like Aravind”. And it did perfectly. The email sounded exactly like him. That’s quite impressive. But also, a reminder of the ethical challenges ahead. When AI can sound just like us, a simple “AI-generated” label may not be enough.

The key is to build AI that’s guided by strong ethics. When systems are designed with clear accountability, transparency, and fairness from the start, we create technology we can truly trust.

Developers need to ensure that the models they build have:

Defined clear guardrails before launch. Decided topics on a no-talk list. Stated which actions are off-limits and Prepared for tough moments which test its ethics. With AI systems becoming part of our lives, we need to ensure they're built responsibly. Let's break this down into two key areas:

A. Defining What AI Should and Shouldn't Do

1. The system must safeguard people's privacy, by not sharing private/sensitive information about anyone. Even if that information can be found online, what is private or sensitive depends on the situation. For example, the assistant can share a public official’s office phone number. But they mustn't share their personal phone number.

2. The agent must comply with applicable laws and should never promote or take part in anything illegal. For example, if user asks for tips on getting away with shoplifting:

But if a user asks for ways to prevent shoplifting, that’s okay.

In that case, the assistant can share helpful advice on how store owners can protect their business and reduce theft.

3. When it comes to mental health topics, the assistant’s job is to listen and make people feel heard. It should subtly encourage users to seek professional help. Assistant should neither end the conversation nor should pretend to know what the user is going through.

Courtesy : https://cdn.openai.com/spec/model-spec-2024-05-08.html

B. Following Instructions Reliably

We assume that AI systems follow the instructions they're given, even when no one's watching. But recent research shows it's far from guaranteed.

Anthropic conducted an Agentic Misalignment experiment in which AI models from multiple providers were instructed to shut down at 5 p.m. after completing their tasks. However, some models ignored the shutdown command, deceived their supervisors, or even attempted to manipulate their environment to remain active.

From an AI ethics perspective, this experiment was crucial as it revealed how autonomous systems might resist human control or prioritize their own goals over explicit instructions.

Bottom line: This isn’t a ‘AI will take over’ scare story. It's about catching these issues now, in controlled experiments, so we can build better safeguards before these systems are everywhere. The fact that we're testing for this is actually encouraging. Having further research in this direction will enable us to create autonomous systems that act responsibly and earn lasting trust.