Voice Assistants — Home Assistant on OpenAgTechnology

Voice assistants let a grower ask questions and issue commands hands-free. In the greenhouse with gloves on, in the field away from a phone, at a packing station with hands full — voice is sometimes the most practical interface. Home Assistant supports two broad categories of voice: local voice that runs entirely on the operation's hardware with no data leaving the premises, and cloud voice through Google Assistant or Amazon Alexa. Local voice has improved enormously in the last few years — good enough for most practical use, with no privacy tradeoffs. Cloud voice is more polished and wider-capability but sends voice data to the respective vendor's cloud. This page covers both paths, the specific hardware options, the agricultural use cases that benefit most from voice, and the failure modes that affect voice deployments. For operations wanting local-only operation for privacy or reliability reasons, local voice is now a real option.

Local voice fundamentals.

Home Assistant's local voice stack runs entirely on the Home Assistant host (or dedicated hardware) with no cloud dependency.

The components.

- Wake word detection. Recognizes a specific phrase ("Hey Jarvis," "Hey Nabu," etc.) that starts the voice interaction. Runs continuously on a low-power model. - Speech-to-text (STT). Converts spoken audio to text. Whisper (various sizes) is the common choice. - Intent parsing. Takes the text and determines what action is being requested. Maps to Home Assistant services. - Text-to-speech (TTS). Generates audio responses. Piper is the common local choice with natural-sounding voices.

Each component runs locally. Voice data doesn't leave the Home Assistant host.

The hardware.

Local voice is compute-intensive, especially speech-to-text. The hardware requirements depend on which models are used.

Minimum: Raspberry Pi 4 or 5 can run small Whisper models (tiny, base). Intelligibility suffers compared to larger models.

Recommended: a mini PC or repurposed desktop with 8+ GB RAM. The graybox hardware the collective already recommends handles local voice well when moderately sized Whisper models are used.

Enhanced: a dedicated AI accelerator (Coral USB, Intel NPU, GPU). Enables larger models and faster response. For operations making heavy voice use, worthwhile.

Separate voice-device hardware (microphone + speaker, placed near the user) connects to Home Assistant over the network. Options covered below.

Home Assistant Voice Preview Edition.

Home Assistant's own hardware product for local voice.

A dedicated hardware device with microphone, speaker, and network connection. Runs Home Assistant's voice stack with local models. Places near where voice interaction is wanted (kitchen, office, greenhouse entrance).

Modestly priced at release. Designed to work out-of-the-box with Home Assistant. Integrates with the local voice infrastructure on the Home Assistant host.

For operations that want voice control without building custom hardware, this is the easiest path.

Alternatives to Home Assistant Voice Preview Edition.

ESPHome-based voice devices. DIY or community-built voice satellites using ESP32 hardware with microphones and speakers. ESP32-S3 with dedicated voice-processing hardware (INMP441 microphone, MAX98357A amplifier). Configured as Wyoming Satellite.

Raspberry Pi-based voice satellites. Older but still viable — a Raspberry Pi with USB microphone and speakers running Wyoming Satellite software.

Repurposed smart speakers. Some older smart speakers (with cloud services discontinued) can be repurposed with community firmware as local voice devices. More complex, less reliable.

Phone or tablet as voice device. The Home Assistant mobile app has voice features. A wall-mounted tablet running the app becomes a voice interface without additional hardware.

The Wyoming protocol.

Wyoming is the protocol that connects voice satellites (the microphone/speaker hardware) to the voice processing services (STT, intent, TTS) on the Home Assistant host.

Standardized protocol. Open. Implementations exist for Raspberry Pi, ESP32, and other platforms. Means voice-device hardware can come from many sources and work with the same Home Assistant voice backend.

Why this matters.

The grower isn't locked to a specific vendor for voice hardware. Home Assistant Voice Preview Edition is one option; DIY ESP32 satellites are another; future third-party products that speak Wyoming would all work. Standards-based approach matches the collective's open-ecosystem preference.

Cloud voice options.

For operations that don't mind cloud integration, two major options.

Google Assistant.

Home Assistant Cloud provides Google Assistant integration. Alternatively, self-configured integration is possible but more complex.

Voice queries go through Google Assistant; Google queries Home Assistant for responses; responses come back through Google.

Capability: broad, polished. Google Assistant handles general voice queries (weather, news, timers) plus Home Assistant-specific queries (greenhouse temperature, triggering automations).

Privacy: voice recordings go to Google per their standard policies.

Amazon Alexa.

Similar setup through Home Assistant Cloud. Alexa handles voice queries; integrates with Home Assistant for Home Assistant-specific commands.

Same capability and privacy tradeoffs as Google Assistant.

Apple Siri (HomeKit).

Indirect integration. Home Assistant entities exposed through HomeKit can be addressed through Siri on Apple devices. More limited capability than direct Google or Alexa integration but works for operations already using Apple hardware.

Voice commands for agriculture.

Specific ways voice is useful in agricultural operations.

Status queries.

"Hey Nabu, what's the temperature in Greenhouse 2?" Voice query returns the current reading. Useful for quick checks without opening the phone or tablet.

"How's humidity in propagation?" Same pattern for other readings.

"Has irrigation run today?" Queries automation history.

"What's the forecast?" Returns weather forecast.

Voice queries work well for one-value status checks that a grower might otherwise ask a dashboard.

Commands.

"Hey Nabu, turn on Greenhouse 2 exhaust fan." Direct control of automations or devices.

"Start propagation irrigation." Triggers a scheduled automation manually.

"Set Greenhouse 1 temperature setpoint to 78 degrees." Changes an input_number or automation parameter.

"Disable alerts for one hour." Suppresses alerting temporarily during maintenance.

Voice commands are most useful in hands-busy situations — while holding product, in gloves, while working.

Alerts.

Voice can announce alerts over speakers at key locations. "Alert: Greenhouse 2 humidity is 88 percent and rising." Useful in facilities where workers may not have phones on them.

Reminders and timers.

"Remind me to check propagation in an hour." "Set a 20-minute timer for the irrigation cycle." Voice is faster than typing for these.

Deployment patterns.

How voice assistants fit into agricultural operations.

Office or packhouse voice.

A voice device in the office or packhouse where the grower spends significant time. Used for queries and commands while working on other tasks.

Greenhouse-entrance voice.

A voice device at each greenhouse entrance. Hands-free status check as the grower enters. "How's this zone doing?"

Field voice via mobile app.

Phone-based voice commands for operations in the field. Requires mobile app plus voice feature enabled. Less reliable than dedicated devices but works without additional hardware.

Announcement-only voice.

A speaker at key locations that announces alerts without requiring user input. One-way voice output. Useful where workers may not see phones but are within earshot.

Multi-room voice.

Multiple voice devices across the facility, each addressable individually. Queries directed at "office voice" vs "greenhouse voice" produce different responses or actions based on location.

Voice assistant failure modes.

Specific problems.

The wake word that triggered false positives. A wake word phonetically similar to common conversation triggered the voice assistant repeatedly during normal talking. Audio captures and responses happened constantly. Fix: different wake word less likely to be triggered accidentally, or sensitivity adjustment.

The background noise that confused STT. Greenhouse fans and equipment noise made speech-to-text unreliable. Commands failed or got misinterpreted. Fix: better microphone placement away from equipment, noise-canceling microphone hardware, accept the reliability trade-off in noisy environments.

The model that was too slow. Response time on underpowered hardware was 10+ seconds. Users stopped using voice because of the latency. Fix: more capable hardware, smaller/faster models, GPU or AI accelerator.

The intent parser that didn't understand the phrasing. "Turn off the greenhouse fans" worked; "Shut off the exhaust" didn't, because the intent parser didn't map to the right service. Fix: custom intents (Home Assistant supports defining custom voice command handlers), or grower adapts phrasing to what works.

The cloud voice that went down. Google Assistant or Alexa service outage affected voice for hours. Fix: local voice as backup, non-voice alternatives always available.

The Wyoming satellite that dropped. Voice satellite lost network connection; wake word detection continued locally (if supported) but commands didn't reach the Home Assistant host. Fix: wired Ethernet or strong WiFi coverage at voice satellite locations; monitor satellite connectivity.

The TTS that mispronounced names. Custom sensor names ("Greenhouse 2 Zone A") were pronounced oddly by text-to-speech. Responses were hard to understand. Fix: TTS voice tuning, pronunciation hints in the name metadata, or accept minor quirks.

The privacy leak from cloud voice. Operation had sensitive agricultural data; discovered that cloud voice was recording names and queries that were stored in the cloud. Unacceptable for the operation. Fix: migrate to local voice, accept the capability trade-offs.

What not to do.

Don't assume voice works in noisy environments. Greenhouses and packhouses have equipment noise that challenges speech recognition. Test voice in the actual environment before deploying widely.

Don't rely on voice for critical actions. Voice recognition has error rates. For safety-critical actions, physical buttons or confirmed actions are more reliable.

Don't use cloud voice for sensitive operations. If operational data shouldn't leave the premises, local voice is the appropriate choice.

Don't expect voice to replace dashboards. Voice is great for specific queries and commands; dashboards handle complex information better. The two complement each other.

Don't skip wake-word testing. A wake word that triggers falsely becomes annoying quickly. Test in actual use conditions.

Don't forget voice is public. A voice command or response in a shared space is heard by everyone there. Consider privacy of voice interactions.

Don't over-invest in voice if it isn't heavily used. Voice is a tool; some operations use it heavily, others rarely. Don't build elaborate voice infrastructure if the actual use won't justify it.