Hermes Agent as a personal Jarvis: what voice-driven task management actually looks like in use

Video: "NEW Hermes Jarvis is INSANE!" by Julian Goldie on YouTube.

What "Hermes Jarvis" actually is

There is no product called Hermes Jarvis. What Julian Goldie is showing is Hermes Agent — the open-source AI agent from Nous Research — configured with voice mode on, a persistent system prompt, and a set of skills suited to personal task work. The "Jarvis" label is a shorthand for the end result: a locally running AI assistant you speak to, that speaks back, that remembers what you told it, and that takes action on your behalf.

It is a configuration, not a separate release.

What the demo covers

The video walks through an active working session — giving Hermes spoken instructions, having it respond by voice, and watching it carry out tasks: web lookups, content drafts, memory recall from earlier in the conversation. What makes the demo convincing is not any single feature — it's the combination. Voice in, voice out, memory across turns, actual task completion.

Put those together and it starts to feel like something different from a conventional chat interface.

Where the comparison to Siri or Alexa breaks down

Consumer voice assistants answer questions. Hermes in this configuration takes on work. The agent uses its skill library to act on your instructions — it can browse, write files, draft content, search your knowledge base, run SEO checks. The voice interface is just how you give it the instruction.

The difference is the same as the difference between asking a search engine a question and asking a colleague. One returns results; the other does something.

The honest caveats

Voice mode is not instant. There is a perceptible gap between speaking and the agent responding — usually a few seconds, occasionally longer on slower hardware or a busy API endpoint. The demo also requires a decent microphone: Hermes's voice recognition is software-side, and a poor input signal makes the agent guess at instructions more than you would want.

And the quality of the underlying model matters more in voice mode than in text mode, because a confused voice response is harder to scan quickly than a confused text block.

What it is actually useful for right now

Morning task briefings — what's on the list, summarise what came in overnight, what's due before noon. Content drafting sessions where you dictate outlines and have the agent fill them in while your hands are elsewhere. Research runs where you ask questions out loud and listen to spoken summaries. Those are the use cases where a voice-capable Hermes earns its setup time.

Complex multi-tool orchestration over voice is still better handled with typed commands — that part of the workflow is not there yet.

Where this connects to NordSys

We install and configure Hermes Agent for clients, including voice mode where it makes sense for a team's workflow. If you want to know whether a voice-capable agent would save meaningful time in your specific situation, the audit call is the right place to start.

See our AI Agents service →