Bayesian A/B Test Calculator

Beta-Binomial model with posterior probability, expected loss, credible intervals, and Python code.

Sample Data:

Control (A)

Visitors

Conversions

Rate: 12.00%

Variant (B)

Visitors

Conversions

Rate: 14.50%

Prior: Beta(α₀, β₀)

α₀ (alpha)

β₀ (beta)

Beta(1, 1) = uninformative uniform prior. Use higher values for informative priors.

Strong evidence: Variant B wins

Relative lift: +20.66%

P(B > A)

95.02%

P(A > B)

4.98%

Loss (choose A)

2.527%

Loss (choose B)

0.032%

Posterior Distributions

Control (A)

PosteriorBeta(121, 881)

Mean rate12.08%

Std dev1.029%

95% CI[10.13%, 14.16%]

Variant (B)

PosteriorBeta(146, 856)

Mean rate14.57%

Std dev1.114%

95% CI[12.46%, 16.82%]

Step-by-Step Calculation

1. Prior: θ ~ Beta(1, 1)

2. Posterior A: Beta(1 + 120, 1 + 880) = Beta(121, 881)

3. Posterior B: Beta(1 + 145, 1 + 855) = Beta(146, 856)

4. Mean A: 121/(121 + 881) = 12.08%

5. Mean B: 146/(146 + 856) = 14.57%

6. P(B > A): 95.02% - via numerical integration of posterior densities

Python Code

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Data
visitors_a, conversions_a = 1000, 120
visitors_b, conversions_b = 1000, 145

# Prior: Beta(1, 1)
prior_alpha, prior_beta = 1, 1

# Posterior distributions
alpha_a = prior_alpha + conversions_a
beta_a = prior_beta + (visitors_a - conversions_a)
alpha_b = prior_alpha + conversions_b
beta_b = prior_beta + (visitors_b - conversions_b)

post_a = stats.beta(alpha_a, beta_a)
post_b = stats.beta(alpha_b, beta_b)

# Monte Carlo simulation (100k samples)
samples_a = post_a.rvs(100_000)
samples_b = post_b.rvs(100_000)

# P(B > A)
prob_b_beats_a = np.mean(samples_b > samples_a)
print(f"P(B > A) = {prob_b_beats_a:.4f}")

# Expected loss
loss_b = np.mean(np.maximum(samples_a - samples_b, 0))
loss_a = np.mean(np.maximum(samples_b - samples_a, 0))
print(f"Expected loss (choosing A) = {loss_a:.6f}")
print(f"Expected loss (choosing B) = {loss_b:.6f}")

# 95% Credible intervals
print(f"95% CI (A): [{post_a.ppf(0.025):.4f}, {post_a.ppf(0.975):.4f}]")
print(f"95% CI (B): [{post_b.ppf(0.025):.4f}, {post_b.ppf(0.975):.4f}]")

# Relative lift
lift = (post_b.mean() - post_a.mean()) / post_a.mean()
print(f"Relative lift: {lift:.2%}")

# Plot posterior distributions
x = np.linspace(
    min(post_a.ppf(0.001), post_b.ppf(0.001)),
    max(post_a.ppf(0.999), post_b.ppf(0.999)),
    1000
)
plt.figure(figsize=(10, 6))
plt.plot(x, post_a.pdf(x), 'b-', lw=2, label=f'Control A: Beta({alpha_a}, {beta_a})')
plt.plot(x, post_b.pdf(x), color='orange', lw=2, label=f'Variant B: Beta({alpha_b}, {beta_b})')
plt.fill_between(x, post_a.pdf(x), alpha=0.15, color='blue')
plt.fill_between(x, post_b.pdf(x), alpha=0.15, color='orange')
plt.xlabel('Conversion Rate')
plt.ylabel('Density')
plt.title(f'Bayesian A/B Test - P(B>A) = {prob_b_beats_a:.1%}')
plt.legend()
plt.tight_layout()
plt.show()

Frequently Asked Questions

What is Bayesian A/B testing?

Bayesian A/B testing uses Bayes' theorem to update prior beliefs about conversion rates with observed data. Unlike frequentist tests that output p-values, Bayesian tests give you the probability that one variant is better than another - a more intuitive metric for decision-making.

What prior should I use?

Beta(1, 1) is the standard uninformative (uniform) prior, treating all conversion rates as equally likely. If you have historical data, use an informative prior like Beta(10, 90) for a ~10% baseline rate. The prior matters less as you collect more data.

What is expected loss?

Expected loss quantifies the cost of making the wrong decision. If you choose variant B, the expected loss is E[max(θ_A - θ_B, 0)] - the average amount by which A could be better. A common decision rule: implement B when expected loss < 0.1% (your "cost of being wrong" threshold).

How is this different from a frequentist A/B test?

Frequentist tests give you a p-value (probability of data given null hypothesis), while Bayesian tests give you P(B > A) - the direct probability one variant beats another. Bayesian tests also allow early stopping without inflating error rates, and provide credible intervals instead of confidence intervals.

When can I stop the test?

Unlike frequentist tests, you can check Bayesian results at any time without penalty. Common stopping rules: (1) P(B > A) > 95% or < 5%, (2) Expected loss < your threshold (e.g., 0.1%), or (3) The value of remaining information is less than the cost of continuing the test.

Related Tools

Two-Proportion Z-Test

Compare proportions for A/B tests

Sample Size Calculator

Power analysis & survey planning

Confidence Interval

CI for means and proportions

import numpy as np from scipy import stats import matplotlib.pyplot as plt # Data visitors_a, conversions_a = 1000, 120 visitors_b, conversions_b = 1000, 145 # Prior: Beta(1, 1) prior_alpha, prior_beta = 1, 1 # Posterior distributions alpha_a = prior_alpha + conversions_a beta_a = prior_beta + (visitors_a - conversions_a) alpha_b = prior_alpha + conversions_b beta_b = prior_beta + (visitors_b - conversions_b) post_a = stats.beta(alpha_a, beta_a) post_b = stats.beta(alpha_b, beta_b) # Monte Carlo simulation (100k samples) samples_a = post_a.rvs(100_000) samples_b = post_b.rvs(100_000) # P(B > A) prob_b_beats_a = np.mean(samples_b > samples_a) print(f"P(B > A) = {prob_b_beats_a:.4f}") # Expected loss loss_b = np.mean(np.maximum(samples_a - samples_b, 0)) loss_a = np.mean(np.maximum(samples_b - samples_a, 0)) print(f"Expected loss (choosing A) = {loss_a:.6f}") print(f"Expected loss (choosing B) = {loss_b:.6f}") # 95% Credible intervals print(f"95% CI (A): [{post_a.ppf(0.025):.4f}, {post_a.ppf(0.975):.4f}]") print(f"95% CI (B): [{post_b.ppf(0.025):.4f}, {post_b.ppf(0.975):.4f}]") # Relative lift lift = (post_b.mean() - post_a.mean()) / post_a.mean() print(f"Relative lift: {lift:.2%}") # Plot posterior distributions x = np.linspace( min(post_a.ppf(0.001), post_b.ppf(0.001)), max(post_a.ppf(0.999), post_b.ppf(0.999)), 1000 ) plt.figure(figsize=(10, 6)) plt.plot(x, post_a.pdf(x), 'b-', lw=2, label=f'Control A: Beta({alpha_a}, {beta_a})') plt.plot(x, post_b.pdf(x), color='orange', lw=2, label=f'Variant B: Beta({alpha_b}, {beta_b})') plt.fill_between(x, post_a.pdf(x), alpha=0.15, color='blue') plt.fill_between(x, post_b.pdf(x), alpha=0.15, color='orange') plt.xlabel('Conversion Rate') plt.ylabel('Density') plt.title(f'Bayesian A/B Test - P(B>A) = {prob_b_beats_a:.1%}') plt.legend() plt.tight_layout() plt.show()