Video: "NEW Hermes Jarvis is INSANE!" by Julian Goldie on YouTube.
What "Hermes Jarvis" actually is
There is no product called Hermes Jarvis. What Julian Goldie is showing is Hermes Agent — the open-source AI agent from Nous Research — configured with voice mode on, a persistent system prompt, and a set of skills suited to personal task work. The "Jarvis" label is a shorthand for the end result: a locally running AI assistant you speak to, that speaks back, that remembers what you told it, and that takes action on your behalf.
It is a configuration, not a separate release.
What the demo covers
The video walks through an active working session — giving Hermes spoken instructions, having it respond by voice, and watching it carry out tasks: web lookups, content drafts, memory recall from earlier in the conversation. What makes the demo convincing is not any single feature — it's the combination. Voice in, voice out, memory across turns, actual task completion.
Put those together and it starts to feel like something different from a conventional chat interface.
Where the comparison to Siri or Alexa breaks down
Consumer voice assistants answer questions. Hermes in this configuration takes on work. The agent uses its skill library to act on your instructions — it can browse, write files, draft content, search your knowledge base, run SEO checks. The voice interface is just how you give it the instruction.
The difference is the same as the difference between asking a search engine a question and asking a colleague. One returns results; the other does something.
The honest caveats
Voice mode is not instant. There is a perceptible gap between speaking and the agent responding — usually a few seconds, occasionally longer on slower hardware or a busy API endpoint. The demo also requires a decent microphone: Hermes's voice recognition is software-side, and a poor input signal makes the agent guess at instructions more than you would want.
And the quality of the underlying model matters more in voice mode than in text mode, because a confused voice response is harder to scan quickly than a confused text block.
What it is actually useful for right now
Morning task briefings — what's on the list, summarise what came in overnight, what's due before noon. Content drafting sessions where you dictate outlines and have the agent fill them in while your hands are elsewhere. Research runs where you ask questions out loud and listen to spoken summaries. Those are the use cases where a voice-capable Hermes earns its setup time.
Complex multi-tool orchestration over voice is still better handled with typed commands — that part of the workflow is not there yet.
Where this connects to NordSys
We install and configure Hermes Agent for clients, including voice mode where it makes sense for a team's workflow. If you want to know whether a voice-capable agent would save meaningful time in your specific situation, the audit call is the right place to start.
See our AI Agents service →