Dlexa: Hedge Dragon Using Mycroft AI

A primer on what AI can and cannot do

AI Links

Wikipedia Definition

Hawking's Warning

AI is neither Artificial nor Intelligent.

The Wikipedia link (on the left) is balanced in that it insists AI is a kind of "machine intelligence" as opposed to a human or "animal intelligence". The Stephen Hawking link has a more popular view that AI robots will take over the world and kill us all...and that must be true it's in all the movies.

Sorry. If you've ever tried to build an AI you'll find how frustratingly limited they are. They give the illusion of intelligence but that is because humans are so good at filling in the gaps.

I'll start with the story of Clever Hans the horse that could do math. Hans was a sensation and travelled from town to town showing off his math skills. In 1907 and investigation showed his trainer was doing the math and Hans was just tapping out the answers with his hoof.

Okay just one more exposed magic trick. But "The Clever Hans Effect" has become the topic of serious research in both psychology and computing science.

When you start poking around in the guts of something like Mycroft you being to understand two things: AI is just Clever Hans and more important you are the trainer. If you do it right the AI appears to have human intelligence. There is a profit in having everyone believe the AI you built is truly intelligent.

To understand the limitations of an AI you need to understand AI should stand for Arithmetic Interpolation

The Arithmetic Part

The basis for the current best AI math for AI is called a Convolution Neural Network (CNN). There are two parts to this method: the "convolution" and the "neural network".

A Convolution is a very general term that just means multiplying two functions together to create a third. The two functions can be anything and if you take any two functions the result will probably of no use (garbage).

If your first function is something you recognize like a picture or a audio recording you can chose the second function to be a filter then the third function (output) is a "better" version. For example you could make a picture of a sunset have more reds and oranges. You could remove the crackles and pops from a song on an old vinyl record. If you are interested in the math of the process there are lots of simple tutorials .

If you know what matrix multiplication is skip this paragraph: In the digital world a convolution is just the term used for multiplying two matrices together. You multiply each the numbers in the cells on the top row of the first matrix against each of numbers in the first column of the second matrix and add them all together. That "sum of products" fills the top left corner cell of the third matrix. You then do each row of the first matrix against each column of the second matrix until you fill all the cells in the third matrix. Matrix Multiplication

When training an AI the first function is of course the input training set. The second function is a set of filters. Getting the right size and values for the each of the cells in these filters can make a huge difference in how well the training works. Better filters mean better identification of the elements and better separation so one element will not be confused with another. This would yield a better translation of inputs to classifications during the execution of the AI.

A Neural Network is way taking many inputs and routing them to many outputs. In simple terms a NN takes inputs and sorts them into categories. The work is done in a tree structure where weights at each node direct the flow of data from the input nodes thought the hidden layers to the output nodes. The NN just multiplies the incoming data value by the weight in that node and passes it on to one of several outputs based on the result. Again this is elementary school math were you just multiply two numbers and make a "is it bigger or is it smaller" decision.

So all the math is CNN is just elementary school arithmetic. How does it get smart? The CNN is a massive arithmetic problem. During training the input is a large sample of all possible inputs the AI will every see. For an STT that could be hours of recorded voice covering thousands of possible sentences. This is passed through a convolution process that creates a pool of identifiable chunks. These are used to set the weighting values of each of the nodes in a many layered neural network.

But smart? It's smart like a dictionary is smart or an instruction manual is smart. It has almost all the answers to all the questions you can ask it. For example let's say I wrote a dictionary where the entry for "stupid" was "See idiot". When you looked up "idiot" the entry was "What you again?" So the dictionary told a joke. You could write an AI that, like that dictionary, could tell jokes. Just train it on all the possible jokes and it would make a joke out of everything you said to it. The AI is not different to the joke telling dictionary. But it is massively more complex.

To repeat: an AI is a very complex classification engine but the math is simple arithmetic. During training it has to do trillions of simple calculations for many hours to calculate the parameter values. It takes billions of calculations to run that classifier and turn a spoken "what's the weather tomorrow?" into spoken weather forecast.

But it is not magic .... it's just arithmetic.

The Interpolation Part

When an AI is trained it is given a very large number of samples. To train a voice you need five hours of audio of a speaker saying several thousand sentences that make up a vocabulary of 20 to 30 thousand words. That training set is the bounds of its database. If it must speak a word that is not in the training set there are rules in the software to approximate. In practice it often says the new word the way the training voice would say it.

Filling in the holes missing from the training set is called interpolation. This is finding a new value approximately between two known values. If an AI were required to use the word "wonderful" as an adverb in a sentence it might invent the word "wonderfulness" because it has basic English rules built into its exception handling.

If there was something the was outside the bounds of the training, for example it had to pronounce Worcester (the city in Massachusetts) it would pronounce it as three syllables. Without a specific entry in its training set it could never guess the correct two syllable pronunciation.

In an application where the plan is to put general purpose AI in control of machine in a factory any holes in its training could be disastrous. If a problem arose that was outside its training set it would try an interpolated solution. For example if a pipe broke and there was no flow at the next station it might increase the input pressure to compensate.

The obvious solution is to increase the size of the training set to cover all possible situations. But that assumes the humans building the training set know all possible situations (see 737 Max).

The AI is trapped within the bounds of its training set. It it very good at interpolating between the elements within those bounds but it is non-functional outside those bounds.

Where Does an Arithmetic Interpolator (AI) Succeed?

An AI is a marvelous librarian. It can take a question, sort through the vast store of human knowledge on the internet, and provide a concise answer (usually). More important the human asking the question needs no computer skills to make it work and the communications tool is not some elaborate electronic protocol. It it is just the human voice. Once a child can speak (about 18 months) they have access to all human knowledge. Now that is as close to real magic as you can get.

As you can tell I am a huge fan of AI. I love it for its strengths. I just want everyone to understand it is not god, it is not even smart. It is just a tool among many tools on the pegboard each with their own applications and limitations. Humans are tool using critters and AI is just a tool and no more.

An AI is also an marvelous assistant. Some chess history. The best human chess players are called grandmasters. In 1997 Deep Blue beat Garry Kasparov who was considered one of the best grandmasters in existence. Chess playing programs were following Moore's Law and it was clear within a few years you could have a chess application on your laptop that could beat any human.

Then a couple of average chess playing humans with a laptops running chess programs showed they could beat the best chess playing AI. There are now contests called Advanced Chess or Centaur Chess (introduced by the same defeated Garry Kasparov) where humans augmented by chess programs compete at a level well beyond grandmaster and outperform any non-augmented AI.

Exponential Growth in AI Technologies

Like many technologies AI is following a kind of Moore's Law exponential with a very short doubling time. When a field is in an exponential everything that you've just done looks trivial and everything in front of you looks impossible. In Jobs Jobs Jobs I discuss the costs of using very new applications like chatGPT for dlexa is in the impossible realm. AI technology is moving so fast that within the 5 year project plan it is entirely possible to make it a component.

Christopher Potts from Stanford Online (January 31, 2023) in GPT-3 and Beyond discusses the rapid advances being made. He repeatedly says that things he thought were impossible are being solved surprisingly quickly. He shows the exponential growth in the size of the models and their remarkable ability to provide very satisfying (human like) responses to complex requests.

Better Liars

Unfortunately it is becoming clear that the more advanced chat AIs turn out to be better liars. It's not intentional because AIs cannot have motives. They just scour the internet for phrases that look like assertions and offer those in well written text.

A recent 60 Minutes episode challenged ChatGPT with a topic it was not allowed to discuss. The AI was designed to change the topic with a "surprising" fact. ChatGPT asserted that 3% of the Antarctic ice sheet was penguin pee. Certainly a surprisingly cleaver way to change the topic. The truth however is that penguins don't pee: they excrete their salty waste as a paste. I'm not sure a $billion computer application that gets caught lying on a global TV program can be called "mature" or "ready for serious employment."

Again: Not Intelligent

Christof Koch of the Allen Institute for Neuroscience in Seattle said:
You can simulate weather in a computer, but it will never be ‘wet.'
I have tried to find the original quote for this without success but there are several versions posted in 2014. It may also be that Koch said it best but the idea has been around for as long as there have been weather forecasting systems (1950s) ... a mixture of both surprising skill and frustration with their predictions.

There are many ways to understand the difference. Dean understands it as: "the Math is not the Physics". To an outdoors person: "The Map is not the Territory". A "Story" about love is not "Love". A story about a puppy pales in comparison to looking into the eyes of a real puppy. And finally an AI is not intelligent.

AI Bots Types (Dean's Classification)

AI bots can be grouped by the kind of problem they try to solve:

Task Oriented Bot (TOB) Inputs pass through 1000s of rules (each entered by a programmer) to reach a specific goal for either human input (plan a vacation) or sensor input (1st generation driverless car).

FAQ A TOB that teaches the user about a specific topic like the history of Latvia or how to play basketball.

TOB and FAQ are usually classed as Virtual Assistants
They are experts in the topic but do not handle anything outside that topic. They are also "closed". In the interaction map all paths and locations are accessible. All paths lead to a location and all locations are connected to at least one path. They have predictable behavior like an accounting system.

Large TOB For large tasks the number of rules to map the task can become practically infinite. That is there are an infinite number of paths that take you from where you are to the where you want to be. However the user of the system doesn't really care which path. Some paths are longer that others so you just want any good path.

In this case the programmer creates a small set of general rules (still 1000s) that gets you close to the end point then "guess and check". This is like golf where you have to get from tee to hole and doing it in two strokes is considered excellent. The number of two stroke paths is infinite but success is easy to measure. You define the tee and hole and the AI finds a good path. A better AI uses more sophisticated math to do the "guess" part faster.

Large TOBs are found in logistics problems (FedEx), construction problems (build a skyscraper) and navigation problems (2nd gen driverless cars). The key difference with Generative AI (below) is the data points (tee and hole) are verifiable truths and the AI is only creative in the path taken. A Generative AI creates new data points which may be lies (like adding a hole closer to the tee).

Big Data Searches though "everything" based on keywords, finding links between keywords and discovering new links and keywords.

Generative From a big data training set builds a set of machine learning parameters (billions of them) used to navigate through a model of the original data. The AI creates new data to fill in gaps found in the original data set. The user can then engage in a wide ranging discussion of topics inside the model often beyond those available in the original data set.

Unexpected Inputs to AI Bots

TOB's goal is to fill "slots". When planning a trip the user needs to decide on the vacation spot so the bot starts with distance (around town or international). If the user types "France" then the bot has a list of a 1000 destinations in France with attributes (like rural, urban, beach) and guides the user based on those attributes. As the bot fills slots with user decisions it builds up a context of the kind of vacation. User constraints (like $10,000 max and two weeks in August) may require edits to the slots. When all its slots are filled it moves on the action phase and books the flight, hotels, activities within the duration and cost constraints.

FAQ's goal is answer your questions and fill in details without being asked. If the topic is baking a chocolate cake then the bot will explain the ingredients, explain the steps, quiz you to make sure you understand. It will also follow your progress with suggestions at each step. If the topic is a new electric Ford F150 the bot will compare it with the old diesel model and advertise the new "computer of wheels" experience.

Rasa (open source and "pro" versions) is the most popular of the TOB and FAQ bots. It has an added feature that rates responses and allows the coder a means to keep the user "happy".

For example an old 1970's adventure game session:
You are in a forest. There is a wide river in the north.
?> N
You cannot go north. There is a wide river in the north.
?> Crap
You cannot go Crap. There is a wide river in the north.
?> S
You are in front of the door of a large white house.
?> Open Door
The door is locked.
?> Crap

Now with modern bot coding:
?> Crap
[response "Crap"; category: sad]
I'm sorry but there is hope. Further in the game there is a boat to cross the river. [This may be a bot hallucination: a lie]
?> Okay
[response "Okay"; category: happy]
Try another fine direction.
?> S
What an excellent choice.
You are in front of the door of a large white house.
?> Open Door
The door is locked.
?> Crap

If you look at the interaction flow in the modern model it would look surprisingly like an old Adventure Game Map

BigData does not interact with a user
It searches through a very large collection of sites and documents (the Big Data), categorizes them, then based on a set of key words reports on the "interesting ones".

For example it might search through "all" recent medical journal articles and reports looking for a specific kind of cancer. An example result would be that a note from Indonesia and an article from Argentina both report a certain kind of pain killer that is no longer used cures a rare type of cancer. The search also reports that a research report from MIT shows that the pain killer molecule is a member of an entire new class of similar molecules. So the headline is: AI "discovers" a cure for cancer.

A more sinister bot may search for you in all social media (Facebook, TicToK, Reddit) and develop a profile with a high confidence level. That information would be sold to a Russian mob and they would use it to take out credit cards in your name.

Training an AI model with very large photographic data sets greatly improves the accuracy of "target" identification. Better face recognition from sidewalk cameras (security), pothole recognition from space (street maintenance), tree height from airplane surveillance (forest management), weed recognition from farm equipment (less pesticide use).

ChitChat (or Chat bots) are similar to FAQ's
but trained on vastly larger data sets (ChatGPT)

This means a user interaction can cover more topics without running into the dreaded "I don't understand that". It also means the interaction map is not "closed". There are paths that just end without reconnecting to the rest of the map. So "unexpected inputs" are extremely common. Much of the work on these types of bots is figuring out what to do when you drive off the edge of the map. The common solutions are: build a bigger map or get a human to code in a plug for the "plot hole".

Mycroft
The STT - Mycroft AI - TTS chain is more ambitious because the input/output is not text from a keyboard but the human voice.

Mycroft is an FAQ class bot that gets "unexpected inputs" all the time. The SST converts spoken voice to words but they are often "misheard". One of Mycroft's major tasks is trying figure from context what was asked (discussed in Dlexa's Mind).

Mycroft has a "skills" model so if it figured out the user wants to plan a vacation it can transfer control to a Rasa bot that was built to plan vacations. When the Rasa skill ends Mycroft can accept new questions.

For most inquiries Mycroft comes with a Wikipedia skill and other internet inquiry skills to answer questions.

As another skill example there is a Zork game (like Adventure). If you ask to play Zork the skill will start and Mycroft will speak:
"West of House. You are standing in an open field west of a white house, with a boarded front door. There is a small mailbox here."

The Last Mile

Christopher Potts suggests there is a lot of work to be done on the "last mile problem" That is getting these AI systems to do something practical in the real world. His interest is in search (like Google) and in context specific tasks where the AI is an intelligent assistant to a skilled worker. He explains the best AIs record human feedback while they are being used and include that input in training sets.

He also suggests the best way to get progress in AIs is to put them in as many real world challenges (the last mile) as possible.

So building a lexa is exactly one of those "last mile" projects that (may) find things that lead to more advanced AIs. If nothing else such a project allows you to keep up and appreciate the advances that are being made. In a exponentially growing technology just keeping up is hard work but will allow you to see both business and new learning opportunities.

The Dlexa Project

I think AIs could become good story tellers. And these stories could augment an inanimate object so it has a presence and personality.

I think the world would be a more magical place if we could have conversations with boulders and trees and hedges.

The Dlexa project is an attempt to employ an AI to make that magic happen in my front yard.

Next:

Check out the Builds tab to see the Dlexa project progress.