Posted by Dr Bouarfa Mahi on 07 Feb, 2025

The sigmoid function plays a central role in artificial intelligence, machine learning, and decision theory. While it has been widely used as an activation function, its origins have remained largely empirical. In this article, we demonstrate that the sigmoid function emerges naturally from the fundamental definition of entropy and structured knowledge accumulation. By deriving the sigmoid function directly from entropy minimization, we provide a first-principles explanation for its effectiveness in learning systems. This discovery offers profound insights into the relationship between entropy reduction, knowledge structuration, and decision-making in AI.
Entropy is a measure of uncertainty in a system, fundamental to information theory and AI learning processes. Traditional approaches view the sigmoid function as a convenient non-linearity, but its deep connection to entropy minimization has remained unexplored.
This article develops a rigorous derivation of the sigmoid function from first principles, showing that it naturally arises when modeling how intelligence structures knowledge over probabilistic decision-making. Importantly, the sigmoid function emerges from entropy because we have redefined entropy beyond Shannon entropy, aligning it with structured knowledge evolution. However, it is crucial to note that while we used the sigmoid function to redefine entropy, this does not mean that entropy is derived from the sigmoid function.
We begin with the fundamental entropy equation, which expresses entropy as an integral of structured knowledge accumulation over probabilistic decision
where:
Differentiating Shannon entropy with respect to
which describes the rate of entropy change as decision probability evolves.
We express structured knowledge
Substituting our entropy derivative:
Using the logarithm identity
which can be rewritten as:
Rearranging for
This is exactly the sigmoid function:
This derivation proves that the sigmoid function is not an arbitrary activation function—it emerges directly from entropy minimization.
This provides the first theoretical foundation for why sigmoid functions work so effectively in AI—they naturally model the evolution of structured intelligence.
This discovery has far-reaching implications for how we design and understand AI models:
Probabilistic decision-making follows a sigmoid function as a direct consequence of entropy reduction and structured knowledge accumulation.
This provides a first-principles derivation of the sigmoid function and a new foundation for AI learning theory. Future work should explore entropy-aware optimization techniques and expand this framework to multi-class decision systems such as the softmax function.
This work redefines the mathematical foundations of AI learning. Instead of arbitrary optimization, AI can now be seen as an entropy-structuring system that follows a well-defined mathematical trajectory.