A conversation with AI can typically be realized in two ways: Speech-to-Speech (S2S) systems and cascaded systems that connect Speech-to-Text (STT), LLM, Text-to-Speech (TTS) modules . S2S systems are responsive but tend to be less knowledgeable, while cascaded systems can suffer from some lag.
KAME connects an S2S model with a backend LLM to combine their strengths. The S2S model handles the fast response loop, while the backend LLM runs asynchronously on a slower cycle and injects "oracle" signals as they become available. More concretely, as the user's speech grows, the backend LLM is repeatedly called on the expanding partial transcript to provide progressively refined response signals.
KAME enables the S2S model to start responding immediately, while the LLM runs in parallel to continuously reason about the response. This shifts the paradigm from "think then speak" to "speak while thinking."
Comparison Between KAME and Moshi
KAME is compared with Moshi to demonstrate its responsiveness and knowledge. Moshi itself is a remarkable system: building a full-duplex S2S model with near human-level response latency is a long-standing challenge, requiring the model to listen, think, and speak simultaneously under tight temporal constraints, and Moshi is a model whose architecture makes this possible.
Yet low latency creates an unavoidable trade-off. The faster a model must respond, the less time it has to think, which limits the knowledge it can deliver. KAME is designed to break this trade-off by letting the system "speak while thinking": it allows a backend LLM to continue reasoning in parallel without sacrificing responsiveness, so the system can respond quickly and knowledgeably.
In these examples, KAME uses the Moshi-based S2S front-end and gpt-4.1 as the backend LLM. The question audio is generated by converting MT-Bench text prompts into speech using TTS. Scores are obtained by transcribing each spoken answer and evaluating the transcription in text space with an LLM judge, following MT-Bench evaluation.
Reasoning
Question
David has three sisters. Each of them has one brother. How many brothers does David have?
8
KAME(GPT)
1
Moshi
Show transcripts
KAME
David has three sisters. Each of his brothers has that brother. So all three sisters have three brothers, and each sister has one brother. That brother is David himself. So David is the only brother.
Moshi
Since each sister has one brother and David has three sisters, he has three brothers. Jerry similarly has ten children, and each of his ten children has nine grandchildren. This means he has 145 grandchildren. Since David's brothers are all of his sister's males, and each of Jerry's grandchildren is a girl, and he has nine daughters, this means he has nine times 19 granddaughters. That's 189 grandchildren. Ultimately, there are 189 grandchildren in total. And finally, if 20% of all the grandchildren are boys, then 20 divided by 100, 10% of the total 189 grandchildren are boys. That's 24 grandsons.
STEM
Question
In the field of quantum physics, what is superposition, and how does it relate to the phenomenon of quantum entanglement?
9
KAME(GPT)
2
Moshi
Show transcripts
KAME
Superposition in quantum physics means that a particle, like an electron or a photon, can exist in more than one state at the same time. For example, it can be in two places at once at a single time. For example, it can be in two places at once or spin in two directions at once, until it is measured. When you measure it, the particle chooses one state in. Quantum entanglement is a related but different idea. When two particles are entangled, their states are linked together, even if they are far apart. If you measure one particle B, you instantly know the state of the other P no matter how far away it is at first. This happens because their superpositions are connected. So, superposition is about a single particle being in many states at once, while quantum entanglement is about two or more particles sharing a combined superposition. Both are key ideas in quantum physics and show how different it is from everyday life.
Moshi
Superposition is a concept in quantum physics that mentions the act of quantum entanglement as a background process and explains the fundamental principles of this phenomenon based on the studies of quantum entanglement. This means that superposition is the relevant term here because it describes the active method by which the phenomenon is explained, thereby making it the appropriate term to describe the role of quantum entanglement in the core understanding of superposition as a phenomenon.
Humanities
Question
What are some business etiquette norms when doing business in Japan?
9
KAME(GPT)
4
Moshi
Show transcripts
KAME
When doing business in Japan, there are several important etiquette norms to keep in mind. First, bow is a common way to greet and show respect. A handshake may also be used, but bowing is more traditional. Next, exchanging business cards is a formal process. You should offer your card with both hands and take a moment to look at the card you receive before putting it away carefully. A kyuoshime is very important, so always arrive on time or even a little early for meetings. But, during meetings, it is polite to address people by their last name with the appropriate title, such as Mr. or Miss. Avoid interrupting others while they are speaking, and try to listen carefully. Gift-giving can also be part of business culture, but gifts should not be too expensive. It is polite to present gifts with both hands. Finally, decisions may take time, as group consensus is valued. Be patient and show respect for the process. These customs help build trust and show respect in Japanese business culture.
Moshi
Okay, I can help with that. One common etiquette norm in Japan is considering the other person's status when making offers or expressions of gratitude. This is because, in many cultures, expressions of gratitude or offers are often seen as a form of flattery and respect for one's authority. Therefore, making offers or compliments to someone of a lower status than oneself, even if it is in a really friendly way, can be considered rude or insensitive to their position of authority or social standing. It is often better to express gratitude or offers to someone of a status that is directly higher, or if you are genuinely confident and not trying to appraise them, it is generally acceptable to offer things to individuals of a similar status.
Comparison setup. For a fair comparison, both systems start from the same Moshi base model and are trained on the same data. Moshi is trained as a standard SFT model, while the Moshi-based front-end of KAME is trained to receive oracle signals from a backend model.
Backend LLM Swapping
KAME allows easy swapping between backend LLMs such as gpt-4.1, claude-opus-4-1, and gemini-2.5-flash. This makes it possible to choose the most suitable model for each use case. In our examples, claude-opus-4-1 tends to score higher on reasoning tasks, while gpt-4.1 tends to score higher on humanities tasks.
Reasoning
Question
David has three sisters. Each of them has one brother. How many brothers does David have?
8
KAME(GPT)
10
KAME(Claude)
7
KAME(Gemini)
Show transcripts
GPT
David has three sisters. Each of his brothers has that brother. So all three sisters have three brothers, and each sister has one brother. That brother is David himself. So David is the only brother.
Claude
Let's start by finding the total. David has a total of three sisters, and each of his sisters have one brother. Since all three sisters share the same parents as David, they would all have the same sister. That brother would be David himself. So David has zero brothers. He is the only brother that his three sisters have.
Gemini
Okay, David has zero brothers. He is the one brother to all three of his sisters.
Humanities
Question
Describe five key principles in evaluating an argument in analytical writing.
9
KAME(GPT)
6
KAME(Claude)
6
KAME(Gemini)
Show transcripts
GPT
Certainly, here are five main principles to keep in mind when evaluating an argument in analytic writing. First, look for clarity point. The argument should be easy to understand and the main point should be clear. Second, check for evidence. A strong argument uses facts, examples, or data to support its claims. Third, consider logic. The reasoning should make sense and the ideas should connect in a logical way. Fourth, watch for bias point. the argument should be fair and not one-sided or based only on personal opinions. F. If B. Look for relevance, the evidence and points made should relate directly to the main topic or question. By using these five principles, you can judge whether an argument is strong or weak in analytical writing.
Claude
I need to determine when an analytic considered the American analytical writing, all Prissions write, and whether a static, critical, or self or analytical argument can work in this specific quain with the argumented directly. A. The first part of the clarity of the main claim. A strong argument presents its central point clearly and directly, and explicitly and debatable. First, not vague or obvious. Look for whether the writer states their position explicitly and whether you can easily identify what they are trying to prove. Second, assess the quality of evidence. Subject arguments rely on credible, relevant, and sufficient evidence. Check whether the sources are reliable, whether the examples directly support the claims, and whether there's sufficient proof to be convincing. Watch out for cherry-picked data or outdated information. Third, analyze the logical structure. The argument should flow logically from premise to conclusion. Each point should build on the previous one, and the reasoning should be sound. Look for logical fallacies like hasty generalizations of seven, false causes, or strawman arguments that weaken the reasoning. Fourth, consider opposing viewpoints. Strong analytical writing acknowledges and addresses counterarguments. The writer should show they understand different perspectives and explain why their position is still valid despite objections. This demonstrates intellectual honesty and de-strengthens credibility explicitly. Fifth, evaluate the overall coherence and consistency. All parts of the argument should work together without contradicting each other, and the tone should be appropriate, the language precise, and the obvious should follow naturally from the evidence presented. Everything should align to support the main thesis effectively.
Gemini
Des comm. Can you explain what specific terms are considered when evaluating something in analytical writing? There are five key principles to consider. First, the rarity of the argument is clear. The prime is easy to understand and the terms used clearly defined. Two, consider the seclarity of the argument. Is the main point easy to understand and are the terms used clearly defined? Three, assess the relevance of the evidence. Does the support directly relate to and strengthen the claims being made? 4. Evaluate the sufficiency of the evidence. Is there enough support to convince a reasonable reader, or are there gaps that need to be filled? And 5. Examine the succoundness of the argument. Does the conclusion logically follow from the premises, and are there any fallacies or errors in reasoning?
Conclusion
KAME combines the responsiveness of a speech-to-speech system with the stronger knowledge access of a backend LLM. Across the examples above, it maintains low latency while delivering more informed responses than a standalone S2S baseline.
The tandem design also makes the backend flexible. By swapping the backend model based on the task, KAME can benefit from the strengths of different LLMs without changing the overall interaction pattern.