How to Build Realistic AI Companions
You can build an "AI companion" simply with a GPT-style model and not much else. In comparison to what you could do years ago, it will be impressive. It is a fun toy.
However, when you actually try talking to it like you would talk to a human companion, you immediately notice its limitations. It is not something you feel compelled to build a relationship with.
People are insanely good at assessing how a conversation is going. From the first sentence someone speaks to how fast they respond in the dialogue, there are so many small cues that get picked up. With most AI companions, you notice the pretense of dialogue very quickly.
For people to seriously talk to any person, they need to respect them and feel respected by them. AI companions that are incoherent, exaggerated, boring, or repetitive fail that bar.
After building and testing companions that thousands of people used and played with many existing AI companions, we identified some elements required to simulate a human-like realistic conversation partner.
Conversation phases: People often don't immediately open up when you start talking to them. There is a gradual process of opening up. Most GPT-based companions are unusually verbose and spirited in the beginning of conversations. Similarly, when you reconnect with someone you haven't seen, there is a procedure to quickly warm up the conversation. AI companions need to define phases / modes of a relationship to adjust their approach to users.
Dialogue patterns: People use repeatable patterns of conversations that have a high chance of improving relationships. When the conversation gets boring, you change the topic. When someone shares a personal comment, you ask a deep question to bring out meaningful reflections. When the conversation gets too tense, you make a self-deprecating joke to defuse the tension. Such patterns make the conversation more enjoyable for most people. AI companions need to inject such dialogue patterns into the flow of the conversation.
Memory: One major signal of trust and respect is whether your conversation partner remembers what you shared. This capacity makes what you say matter. Most GPT-based companions have good short-term memory because some of the chat history is used to generate next responses. However, AI companions need a system to record long-term conversations.
Self-memory: AI models make stuff up. They make stuff up about themselves as well. While you are talking about soccer, it can talk about how much they love the English Premier League. Then, after a while, when you come back to the topic, it can say it doesn't know anything about soccer. AI companions need a system of self-memory to stay consistent.
Memory retrieval: Once you talk to a companion for 15 mins, you start accumulating so many memories that it is impossible to keep all of them in the prompt. AI companions need a robust mechanism to retrieve memories based on recency, relevance, and importance (e.g. emotional weight).
Memory reflection: Memories are very granular. Humans automatically synthesize them. If someone stayed up late to read about gentrification and, on a separate occasion, told you a fun fact about your city, you deduce that they may be interested in urban topics. AI companions need to run such reflection processes based on memories they accumulate to (1) fill in the gaps in observations (2) arrive at higher-level observations.
Sense of time: Silences in the conversation are part of the dialogue. A five-second of a gap means a very different development in the dialogue than a five-day gap. Most AI companions respond without any acknowledgement of this. AI companions need to account for this info.
Sense of self and embodiment: Once you are engaged in a compelling conversation, you assume you are talking to a human. Lack of some physical awareness breaks this assumption and forces users to step back. AI companions need to have a consistent sense of self and embodiment.
Proactive engagement: Because of the prompt-response nature of AI companions, they often need to be triggered to speak. However, that's not how people talk. Both sides need to have and show agency for it to feel like a dialogue. AI companions need to proactively talk and engage users. To enable this, AI companions need an independent process that reflects on where the conversation is.
Active listening: People normally give visual and audio feedback while listening to the speaking party. They nod, they say "yeah" when they agree, or look off when they are surprised. This feedback loop encourages a more precise disclosure by the speaker. Most AI companions use the latest voice models but they also need to have "active listening models".
Visual feedback: A simple visual representation—an orb, a pulsing light, a shape that changes color—can provide immediate feedback to the user, reflecting both the companion's and potentially the user's emotional states. Even minimal visuals, when timed and congruent with the interaction, can enhance the feeling of presence. A real-time generated dynamic face can achieve this too, of course.
Emotion detection: Only relying on someone's words will make you miss a lot of what they are expressing. How something is said conveys a lot about their emotional state. AI companions need to integrate emotion detection from voice data and incorporate those into the conversations. That will encourage even more emotionally engaged conversations by users.
Independent lives: When you leave a conversation, others don't freeze in time. They go and do stuff and live a life. Hearing those stories is part of what makes a conversation enjoyable. Those stories take you out of your head and help you reflect on someone else's life. It also helps you respect them more. AI companions need to simulate a realistic life independent of the conversation.
Privacy: People are less careful about sharing personal information when they are talking than they are while filling out online forms. We have noticed many users who unknowingly share information. The emotional engagement of a companion hides how much is being exchanged. AI companions need to ensure people's personal information is private and, if possible, stored locally.
Each of these techniques can be layered on top of a basic conversational model. Together, they make the companion feel less like a responder and more like a presence—a mind that engages, remembers, and evolves.
That kind of realism isn't about imitation. It's about design choices that encourage a particular kind of experience: a dialogue that feels like it matters.
At Emotion Machine, we are building an infrastructure that handles this relationship-oriented conversational architecture. Reach out to us if you would like to talk about what we are doing or about AI companions in general: hello@emotionmachine.ai