Replacing Intuition with Mathematical Language
This article explains the mathematical principles of the Bayesian Engine covered in the BA02 episode and its effectiveness. The goal is to precisely predict sales success probabilities in an uncertain business environment. Essentially, it covers the process of deriving optimal decision-making indicators by combining the Beta Distribution, which quantifies past experiences, and the Binomial Distribution, which captures real-time signals from the field. In particular, it emphasizes maximizing the system’s real-time performance and computational efficiency by utilizing Conjugate Prior Distributions, which allow for immediate updates without complex calculations. Furthermore, this model adopts a Recursive Estimation method that makes immediate judgments whenever data occurs, securing technical validity optimized for modern business. Consequently, this document clearly shows how sophisticated mathematical modeling transforms vague intuition into reliable data-driven insights.
In the fog of business, sales directors, managers, and executives who must make decisions always feel thirsty. They crave the answer to the question, “What is the winning rate in this situation right now?” The ‘Bayesian Engine‘, the heart of the Exa system, translates this abstract process into the most sophisticated language: mathematics.
In this article, we will deeply analyze the mathematical pillars supporting the architecture of this engine in sales environments or similar situations, and why this is the ‘optimal solution’ in enterprise environments.
Meanwhile, Bayesian models based on MCMC or Deep Learning are great assets of humanity for solving high-dimensional complex problems. Nevertheless, emphasizing that the ‘mathematical efficiency’ and ‘clarity’ possessed by the Beta-Binomial Model are the most powerful weapons in specific domains like sales success probability inference is also a way to secure technical objectivity.
Note: Exa’s AI engine uses appropriate Bayesian mathematics according to individual situations. Since the applied situations vary, most Bayesian mathematics are applied, and AI technologies already proven in the field—such as ML (Machine Learning), DL (Deep Learning), RL (Reinforcement Learning), and LLM (Generative AI)—are mobilized within the engine based on business needs. This article targets only the technical content of the mathematics used in the sales episode.
Reflecting this context, while respecting the reason for the existence of each technology, I intend to logically describe why the technologies used in this episode are the ‘Golden Standard’ in this field.
1. Quantification of Experience: ‘Beta Distribution’ as Prior Distribution
All Bayesian inference starts from the subjectivity, intuition, belief of the person (stakeholders), or researched/known empirical data of the domain—in other words, ‘what we believe to start with’. In the case of this scenario type, the model contains the initial state of the business or accumulated experience in a vessel called the Beta Distribution.
1.1 Mathematical Definition
The Beta distribution is a probability density function optimized for handling probability values between 0 and 1. The function is defined by the formula below. (Details of the Beta distribution are explained in another article dissecting the Beta distribution.)
$$f(x; \alpha, \beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}$$
Here, the denominator is the Beta function, a normalization constant that makes the sum of total probabilities 1, and the core drivers are the two parameters and .
- (Alpha): Strength of accumulated evidence for success
- (Beta): Strength of accumulated evidence for risk or failure
1.2 Interpretation
Let’s look at the structure of the numerator in the formula. As increases, the center of the distribution moves toward 1 (success), and as increases, it moves toward 0 (failure).
At the beginning of a business, based on market statistics, we can assign values like . This shapes the ‘prior empirical knowledge‘ that “so far, 2 out of 10 times were successful” into a mathematical curve.
The probability is calculated as . This allows us to model prior experience, knowledge, or domain intuition into numbers like “Success rate, defect rate, response rate… is 20%”. Here, 2 and 8 represent the strength of belief; the larger the numbers, the stronger the belief. For example, 20 and 80 have the same 20% success rate as 2 and 8, but the strength of belief is much greater.
and are hyperparameters that we assign ourselves (or measure from past performance data) so that we can model prior knowledge. These values are adjusted by the Bayesian Engine to actual values as data (evidence) accumulates. This is the starting point of the process of tracking how well subjective probability matches actual data.
In other words, the starting point of this model is that it begins with intelligence possessing experience, not in a state with zero data.
2. Signals from the Field: ‘Binomial Distribution’ as Likelihood Function
Events occurring in the sales field (meetings, quote requests, etc.) eventually result in discrete outcomes: a ‘successful signal’ or a ‘non-successful signal’. The tool that captures this is the Binomial Distribution.
2.1 Mathematical Definition
The probability of succeeding times when an event with a success probability of is performed times is as follows:
$$P(X=k) = {n \choose k} p^k (1-p)^{n-k}$$
This formula quantifies (Likelihood) the ‘fact (Evidence)‘ heard from the field. measures how much the probability we assumed matches the actual result . The system regards the results of every step entered by the salesperson as this binomial trial, substituting rough interactions with refined mathematical signals.
2.2 Weight of Evidence (WoE)
Why do some signals have high weights while others have low weights?
The Bayesian model used in this episode reflects the concept of Weight of Evidence (WoE)—used by Claude Shannon in Information Theory and Alan Turing in cryptography—into the evidence data of the likelihood function (Binomial Distribution).
It is the logarithm of the Likelihood Ratio between the probability of a signal appearing in the ‘success’ group and the probability of it appearing in the ‘failure’ group. The reason why “mentioning a competitor at the final contract negotiation stage” is fatal is that the Information Gain when that signal occurs at that stage is much larger than at the initial stage.
The use of log-scale weights is the result of mathematically reflecting this ‘density of information‘.
2.3 Interpretation
This formula quantifies the ‘fact (Evidence)’ from the field by reflecting WoE. measures how much the probability we assumed matches the actual result . The system regards the results of every step entered by the salesperson as this binomial trial, substituting rough interactions with refined mathematical signals.
3. Combination of Knowledge: The Magic of ‘Conjugate Prior’
The pinnacle of the Bayesian Engine lies in the update process that creates ‘tomorrow’s certainty’ by adding ‘today’s signal’ to ‘yesterday’s knowledge’.
3.1 Mathematical Combination (Posterior Update)
By Bayes’ theorem, the Posterior probability is calculated as follows:
$$P(p|Data) \propto P(Data|p) \times P(p)$$
At this time, when the Beta Distribution (Prior: prior knowledge, subjective belief) and the Binomial Distribution (Likelihood: evidence data) are combined, an amazing mathematical harmony occurs. The mathematical process of this combination will be explained in a separate article dissecting the Beta distribution, but the resulting formula below can be verified through various mathematical textbooks.
$$P(p|k) = \frac{p^{(\alpha+k)-1}(1-p)^{(\beta+n-k)-1}}{B(\alpha+k, \beta+n-k)}$$
Looking at the result, the posterior distribution also becomes a Beta distribution with parameters and , taking the form of the prior Beta distribution.
3.2 Elegance of Analytical Solution
This is the power of the Conjugate Prior (the posterior distribution, combining the Beta distribution containing prior knowledge and the Binomial distribution which is the evidence data distribution, converges back to a Beta distribution). The update is completed by simply adding the signal to the existing value without complex integration operations. In computer science terms, this is a constant time operation with a computational complexity of . This is the reason why there is almost no server load even when processing thousands or tens of thousands of orders in real-time—the basis for the proposition “The calculation is as light as a feather, but the result is as heavy as a rock.”
4. Technical Justification: Why the ‘Beta-Binomial Model’ for This Problem?
The technical values possessed by Deep Learning Bayesian and MCMC (Markov Chain Monte Carlo) are core assets of modern data science. However, every tool has an optimal usage where its capabilities can be maximized.
For example, when calculating the on-time delivery probability of a Purchase Order (PO) through the Exa Bayesian Engine, the MCMC simulation model is very effective. This is because the MCMC model is capable of large-scale Batch calculations and can sophisticatedly reflect not only average normal delivery data but also so-called ‘Outlier’ data such as ‘delivery delays’.
Ultimately, the flexibility to select and apply the optimal model according to the complex variables in the field is paramount, and the importance of such appropriate model utilization cannot be overstated.
4.1 Roles of MCMC and Deep Learning Bayesian
MCMC is excellent for approximating high-dimensional probability distributions where thousands of variables are intertwined. Deep Learning-based Bayesian is essential for extracting complex patterns from unstructured data (images, voice, etc.). These are powerful solutions that find the answer through numerous simulations and sampling.
$$A(x^*, x_t) = \min \left( 1, \frac{P(x^*)g(x_t|x^*)}{P(x_t)g(x^*|x_t)} \right)$$
(MCMC sample acceptance probability formula: Requires tens of thousands of iterations)
4.2 Unique Strengths of the Beta-Binomial Model
On the other hand, in domains with clear targets of ‘success and failure’ like sales success rate prediction, the Analytical Solution provided by the Beta-Binomial model becomes the ‘Golden Standard’.
- Real-time: Immediate response is possible without heavy sampling.
- Explainability: It can clearly explain why the probability changed through the increase or decrease of and .
We would use Deep Learning and MCMC for more complex problems, but at this point where rapid business decision-making is required, we have chosen this most clear and elegant method.
5. Revolution of Architecture: Recursive Bayesian Estimation
In an era of exploding data, reloading ‘all past data’ every time is inefficient. The engine of this model adopts a Recursive architecture that focuses on the ‘essence of information’.
This is the deepest root of this model:
All past meeting logs are already perfectly compressed into just two numbers, and , of the current state (posterior distribution updated by the combination of prior knowledge and data evidence). When a new signal comes in, the system simply adds the signal to the current state instead of rummaging through past logs.
Principle of NASA’s Orbit Correction and Self-Driving Car’s Real-Time Location Correction
This theory shares the exact same mathematical lineage as the Kalman Filter, which tracked the position of spacecraft in NASA’s Apollo program as a technique to infer state in real-time whenever data comes in sequentially.
Traditional statistics start analysis “after all data is gathered,” but Recursive Bayesian makes judgments “as soon as information occurs.” This is the most rigorous algorithm for managing uncertainty in ERP environments where real-time performance is vital.
When Mathematics Becomes a Tool for Business
Through [Appendix Part 1], we have seen the mathematical order hidden beneath the massive iceberg of the Bayesian Engine.
- The Beta Distribution is a vessel that holds your experience.
- The Binomial Distribution is a filter that accepts hot signals from the field.
- And through the blessing of the Conjugate Prior, the system derives the most accurate conviction in the lightest way.
This is not a simple statistical tool. It is a ‘Decision Compass’ that tracks and guides your business sophisticatedly like a spacecraft orbit.
[Next Teaser: Part 2]
Why does probability drop on days of ‘Silence’ when no data comes in?
Next time, it is time to examine the inside of the ‘Paradox of Silence and Log Weights’ from the perspective of Information Theory.
