Dlexa the Hedge Dragon
Dlexa HomeDlexa's BodyDlexa's MindFantasy Story for DlexaArtificial Inteligence is MathBuild Notes and Logs
 

Getting Dlexa to Hear You

This page is about making hedge dragon ears (microphones) that work. For Dlexa to have a conversation she needs to acquire her subjects voices clearly. Any noise will make the STT (speech to text) processing more difficult and less accurate.

Microphones are delicate membranes that detect the air pressure changes from a spoken voice 6 to 8 feet away. This is not something that does well or survives long in the wind and rain.

Microphones are cheap and available using a variety of different technologies. Microphone "houses" are complex and making a good ones is as much luck and art as it is science.

Microphone Technology Choice

Microphones have been around for over a hundred years. Electret (condenser) microphones are a good choice because they are small and have a nearly flat response to sound. Being small an Electret plus it's electronics can be housed in a pea sized container. The task then is to mount this little can in a housing that keeps it dry, protects it from noise (like wind) and enhances the sensitivity to human voice over other sounds.

Enhancing voice involves adding electronics. This can become elaborate and expensive. Microphone arrays with four or more microphones feeding a digital signal processor can not only "focus" on the source of the voice they can even identify the location. This means they can track multiple speakers like a radar system. They can also be rendered useless by a poorly designed "house".

For Dlexa the design is a single voice standing directly in front of the hedge dragon. All we need is an amplified Electret with a low pass filter to concentrate on the frequencies of a human voice. And of course a "house" that only picks up sound from a "narrow cone" directly in front with a "muff" to absorb wind noise.

The hobbyist community has a very popular Electret with a MAX4466 adjustable gain amplifier. From China five of these were purchased for $8 (including shipping).

The microphone, camera (ESP32-CAM) and a microwave (RCL-0515) proximity detector can be supported by a single styrofoam block and inserted into a small craft bird house from a dollar store (click for larger):

Easy in Theory

Getting a microphone to be sound-isolated from the house, waterproofed, and with an anti-wind muff is hard. The first attempt at a microphone was to just use an old web-cam mic but that died after 3 sunny days. An outside installation is much harsher than it looks.

After a number of failed attempts the current "MicHouse" design is still under stress testing (Oct 2023) and might be in service in a month or so. Then we'll see how it survives Winter and Spring.

Things That Did Not Work

To preface: things that did not work for me. This is where the complexities of the practical problem overwhelm the simple solutions. You may have "better ideas" but I challenge you to prove them by building your own low noise mic (and I'll happily steal your better design).

The two mic solution where you subtract the forward pointing mic (signal) from the backward pointing mic (background noise) failed because there is no good way to set the amplification on the two signals. A unit gain: [signal x 1.0] - [noise x 1.0] creates a whole new class of noises that confuse the STT. Trying to fix this with autogain control [signal] - [noise x (average signal)/(average noise)] makes things worse. I did not try but I'm sure adding a mixture of low and high pass filters would not be any better.

I looked at trying to implement my own DSP (Digital Signal Processing). A DIY DSP starts with a digital version of [signal mic] - [noise mic(s)] that you can enhance with digital versions (FFT) version of various high, low, notch filters. The commercial DSPs depend on years of development and advanced audio math to work their magic and I was unwilling to become a DSP expert.

Perhaps an AI DSP? Take several thousand hours of raw audio and train an AI model to enhance the signal and remove the noise. Again this is probably years of effort and a deep understanding of advanced audio math.

Wait a minute. This has already been done. It is called an SST (Speech to Text) and it is already the (free) front end for Dlexa. Putting an AI DSP before an AI SST is likely to make things worse. Giving the AI SST a "good enough" signal with obvious physical "house noise reduction" (mic device suspension and a wind muff) and a low pass filter is all that is really needed.

Next

Once this part of the "sound infrastructure" is in place the linux server running Mycroft can listen. That done: what Dlexa and Pearl hear becomes (just?) a programming problem.

The next task is to connect the "Microphone House" permanent wiring (under the hedge body) for power, WiFi, and USB. See Hedge for the cabling plans.