The Supporting Scaffolding
So in part one, we outlined the purpose of the AI core as a next word predictor, or next 'sea hieroglyph' predictor.That's great !!!! But more complexities come when the input is more complex or questioning. Get ready to enter the world of 'belief' because you're going to need it.
Let's say for example the input is 'Hi Grok. I'd like to know about the history of Rome and what the weather is going to be like today in London'
Remembering that, the core is just the sausage machine next word (token) predictor.
So - we enter the world of 'Intent Parsing', 'Chain of Thought' and 'Tool Calls' - and all this happens before anything hits the core.
These things - we can call them pre-processors. Relatively simple tasks (computationally) that are used to manipulate the user input into something that's core ready.
Lets deal with 'Intent Parsing'. Basically this is to break the input down into more manageable chunks. So in this example we have 'Hi Grok' (Greeting), 'I'd like to know about the history of Rome' (Requires factual existing knowledge) and 'Whats the weather like today in London (Question that requires external information)
This, at the next layer is broken down into CoT (Chain of Thought). It's best imagined as a To Do list really.
• Hi Grok (Greeting) - Respond Politely
• I'd like to know about the history of Rome (Question that requires factual existing knowledge) - Retrieve relevant facts from existing knowledge
• What's the weather like today in London (Question that requires external knowledge) - Get Information from Weather Website for London (Tool Call)
As you can see, in the final one in the list, given that Grok doesn't know at this point what the weather will be like in London, a Tool Call is made to go and get that information. So this is done (external to the core). Grok goes and checks that on the internet.
Having got this information, all of this is bundled together for the core to operate on. The pre-processing stage is complete - and yes, the input to the core is more complicated than just what you typed.
• Hi Grok (Greeting) - Respond Politely
• I'd like to know about the history of Rome (Question that requires factual existing knowledge) - Retrieve relevant facts from existing knowledge
• What's the weather like today in London (Question that requires external knowledge) - Get Information from Weather Website for London (Tool Call)
• The weather today in London is cloudy with patch rain. Average Temp 13 degrees.
• Structure responses with bullet points. Keep Rome history brief.
This is where the 'step of faith' is usually required, but try to keep this in mind...... In our earlier example of 'The cat sat on the mat', it is not something you perceive as a string of words in your head. In your head you see it as a picture. It is very similar to what the core will see when it has performed its initial mathematics on the input above. All of the words get mixed up into a 'mathematical number picture' so that the core can try to predict the next word.
The full picture is whats known as the 'End State'. It is this that Grok Core will use to try to predict the next word. The End State Number Picture.
And that's about it really, the data is processed in the same way. The choice of response words will in the same way come from the list of known words and probability (temperature) will determine the actual choice from a list of possibles.
Finding that hard ?? It's not desperately easy to explain. But let me try another way. Let us imagine I were to show you a picture postcard of London Bridge with the words on the bottom 'What's This ?'. Your brain (without words flowing into your ears) can take that all onboard from a snapshot image. Then as a human you can immediately start to respond with..... Oh..... It's......London.....Bridge.....
So in a lot of ways it's just the same as doing that but with some maths. Hope that helps. Part 3 coming soon !!!