Introduction
In most courses and universities, artificial intelligence is presented as a rigid technical ritual:
Choose a model, define a loss function, run gradient descent, and get numbers.
This approach succeeds in running models… but it fails to explain them.
And the result? Many people know how to use models, but only a few understand why they work in the first place.
The often unspoken truth:
Artificial intelligence is not just algorithms.
It is a probabilistic way of seeing the world.
And when you realise this, the whole picture flips.
The problem with the way artificial intelligence is taught
When artificial intelligence is taught as a series of equations, something dangerous happens:
You learn what to do, not what you are assuming.
You memorise loss functions without knowing where they came from.
You apply ready-made models without understanding their limits or risks.
And this is where the real mistakes begin in practice.
To understand artificial intelligence deeply, we need to return to the simplest model:
Linear regression — but this time, from a completely different angle.
Linear regression as you have never been explained before
The traditional explanation says:
We want a line that passes "close" to the points, so we minimise the mean squared error (MSE).
But the real question is:
Why the squared error? And why this specific form?
The answer is not in pure mathematics… but in probabilities.
Let’s tell the story instead of the equation
Imagine a person (let's call him Ahmed) who wants to predict the outcome of a sports match.
He believes that the result depends on two factors:
The number of training hours
The energy level of the team on match day (from 0 to 10)
Harry does not claim perfection.
He knows that reality is full of noise, errors, and unseen factors.
So he does not say:
"The result = an exact equation"
Instead, he implicitly says:
"The result is often close to this equation, but with random error"
And here the big conceptual shift happens.
The output is not a number... but a probability distribution.
When we think this way, the model's output becomes a random variable, not a fixed value.
To be more precise:
The prediction lies at the centre of the distribution.
The error follows a normal (Gaussian) distribution with a mean of zero.
This simple assumption changes everything.
Because then you do not ask:
What is the least error?
But you ask:
What values make the data I have seen more likely to occur?
Read also:The Age of Smart Design: How AI Will Redefine the Meaning of Good Design in 2026.
Here the concept of Maximum Likelihood comes in.
Linear regression, from this perspective, is not about "minimising error".
Rather, it is about maximising likelihood.
We are looking for the weights that make the data we have:
Logical.
Probable.
Not surprising to the model.
And when we write this likelihood mathematically, then take the logarithm, and simplify it...
We surprisingly arrive at the same famous loss function:
Mean Squared Error.
Not because it is an arbitrary choice.
But because it is the natural result of assuming that the error follows a normal distribution.
What does this mean in practice?
This means that every artificial intelligence model carries hidden assumptions about the world:
About noise
About the nature of data
About what is 'normal' and what is 'anomalous'
And when you do not understand these assumptions, you are using a tool without understanding its internal logic.
This explains:
Why models fail in certain environments
Why they sometimes lead to dangerous biases
And why simply 'running the model' is not enough to build a reliable intelligent system
Artificial intelligence = Philosophy + Probabilities + Engineering
The best practitioners of artificial intelligence today are not those who memorise algorithms,
but those who understand:
What does the model assume?
When do these assumptions fail?
And what are the ethical and commercial risks involved?
In 2026, the difference will not be between those who 'know AI' and those who do not,
but between those who understand it as a system of thought and those who treat it as a black box.
In summary
When you reduce MSE, you are not just taking a mathematical step.
You are implicitly saying:
'I assume that the errors are normally distributed, and that this model explains the world in this way.'
And when you understand this, the way you build models changes,
and you become more aware, more responsible, and more professional.
Artificial intelligence does not start with code.
It starts with the way you see reality.
🚀 With Echo Media
AtEcho Media, we do not write about artificial intelligence as a passing trend,
but we interpret it as a tool for thinking, decision-making, and a real impact on business and humanity.
If you are:
Building a product
Leading a team
Or do you want to understand AI in depth, not superficially?
Follow our upcoming articles, and share this article with anyone who thinks that artificial intelligence is just code.
Understanding is the new competitive advantage. From here