Loading...
Loading...
Beta-Binomial model with posterior probability, expected loss, credible intervals, and Python code.
Beta(1, 1) = uninformative uniform prior. Use higher values for informative priors.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Data
visitors_a, conversions_a = 1000, 120
visitors_b, conversions_b = 1000, 145
# Prior: Beta(1, 1)
prior_alpha, prior_beta = 1, 1
# Posterior distributions
alpha_a = prior_alpha + conversions_a
beta_a = prior_beta + (visitors_a - conversions_a)
alpha_b = prior_alpha + conversions_b
beta_b = prior_beta + (visitors_b - conversions_b)
post_a = stats.beta(alpha_a, beta_a)
post_b = stats.beta(alpha_b, beta_b)
# Monte Carlo simulation (100k samples)
samples_a = post_a.rvs(100_000)
samples_b = post_b.rvs(100_000)
# P(B > A)
prob_b_beats_a = np.mean(samples_b > samples_a)
print(f"P(B > A) = {prob_b_beats_a:.4f}")
# Expected loss
loss_b = np.mean(np.maximum(samples_a - samples_b, 0))
loss_a = np.mean(np.maximum(samples_b - samples_a, 0))
print(f"Expected loss (choosing A) = {loss_a:.6f}")
print(f"Expected loss (choosing B) = {loss_b:.6f}")
# 95% Credible intervals
print(f"95% CI (A): [{post_a.ppf(0.025):.4f}, {post_a.ppf(0.975):.4f}]")
print(f"95% CI (B): [{post_b.ppf(0.025):.4f}, {post_b.ppf(0.975):.4f}]")
# Relative lift
lift = (post_b.mean() - post_a.mean()) / post_a.mean()
print(f"Relative lift: {lift:.2%}")
# Plot posterior distributions
x = np.linspace(
min(post_a.ppf(0.001), post_b.ppf(0.001)),
max(post_a.ppf(0.999), post_b.ppf(0.999)),
1000
)
plt.figure(figsize=(10, 6))
plt.plot(x, post_a.pdf(x), 'b-', lw=2, label=f'Control A: Beta({alpha_a}, {beta_a})')
plt.plot(x, post_b.pdf(x), color='orange', lw=2, label=f'Variant B: Beta({alpha_b}, {beta_b})')
plt.fill_between(x, post_a.pdf(x), alpha=0.15, color='blue')
plt.fill_between(x, post_b.pdf(x), alpha=0.15, color='orange')
plt.xlabel('Conversion Rate')
plt.ylabel('Density')
plt.title(f'Bayesian A/B Test - P(B>A) = {prob_b_beats_a:.1%}')
plt.legend()
plt.tight_layout()
plt.show()Bayesian A/B testing uses Bayes' theorem to update prior beliefs about conversion rates with observed data. Unlike frequentist tests that output p-values, Bayesian tests give you the probability that one variant is better than another - a more intuitive metric for decision-making.
Beta(1, 1) is the standard uninformative (uniform) prior, treating all conversion rates as equally likely. If you have historical data, use an informative prior like Beta(10, 90) for a ~10% baseline rate. The prior matters less as you collect more data.
Expected loss quantifies the cost of making the wrong decision. If you choose variant B, the expected loss is E[max(θ_A - θ_B, 0)] - the average amount by which A could be better. A common decision rule: implement B when expected loss < 0.1% (your "cost of being wrong" threshold).
Frequentist tests give you a p-value (probability of data given null hypothesis), while Bayesian tests give you P(B > A) - the direct probability one variant beats another. Bayesian tests also allow early stopping without inflating error rates, and provide credible intervals instead of confidence intervals.
Unlike frequentist tests, you can check Bayesian results at any time without penalty. Common stopping rules: (1) P(B > A) > 95% or < 5%, (2) Expected loss < your threshold (e.g., 0.1%), or (3) The value of remaining information is less than the cost of continuing the test.