Mathematical Optimization

A bird’s eye overview

A very short, very opinionated selection of topics

Diego Sandoval

whoami?

What is optimization?

Optimization is what you (want/try to) do with your salary at the end of the month.

An intuition

Best possible [under constraints]

Peace of mind

The main difference is that mathematical optimization gives you a guarantee.

But …

  • Formulation is not always possible
  • Unfeasibility
  • Intractability
  • Optimal … but not quite
    • Locally optimal
    • Nearly optimal
    • Optimal for a proxy

Examples of optimization

Outline

  • Intro
  • Pen and paper optimization
  • Algebraic modeling languages
  • Outro
    • A teaser of a different topic

Intro

Isoperimetric problem

What is the maximum area that can be bounded with a curve of fixed length $L$?

$$ A \le \frac{L^{2}}{4\pi} $$

$$ A = \frac{L^{2}}{4\pi}? $$

$$ \iff A_{\bullet} = \frac{L_{\circ} ^{2}}{4\pi} $$

Dido (~814 BC)

Queen of Carthāgō

Dido’s Carthāgō

Source map: brilliantmaps.com

Rewriting history

Let’s do some counterfactual reasoning:

What would have Dido do had the king impossed the constraint that the shape had to be rectangular?

us

Given a rectangle with perimeter $L$, what’s the relation between $\ell$ and $w$ that yields the maximum area?

$$ \begin{aligned} & \max_{\ell, w} & & \ell w \\ & \text{subject to} & & 2\ell + 2w = L \end{aligned} $$

Euclid (~300 BC) found that the relation that yields the maximum area is $w = \ell$, i.e., a square.

Pen and paper optimization

Counterfactual Carthāgō

$$ \begin{aligned} & \max_{\ell, w} & & \ell w \\ & \text{s.t.} & & 2\ell + 2w = L \end{aligned} $$

$$ \ell = \frac{L}{2}- w $$

$$ \begin{aligned} & \max_{w} & & -w^2 + \frac{L}{2}w \\ \end{aligned} $$

Optimization

$f(w)=-w^2 + \frac{L}{2} w$

Optimization $\to$ calculus $\to$ derivatives

$f(w)=-w^2 + \frac{L}{2} w$

$$ \begin{aligned} & \max_{w} & & -w^2 + \frac{L}{2}w \\ \end{aligned} $$

$$ \frac{df}{dw} = -2w + \frac{L}{2} = 0 $$

$$ L = 4w $$

$$ 2w + 2\ell = L = 4w $$

$$ w = \ell \quad \blacksquare $$

Algebraic modeling languages

Problem modeling strategy

Know thy solver

  • Options
  • Methods (algorithms)
  • Class of supported problems (LP, NLP, mixed integer, etc.)
  • Local or global

Algebraic modeling languages

  • AMPL
  • GAMS
  • GLPK
  • Sub-languages of Generic Programing Languages
    • Pyomo for Python
    • JuMP (Julia for Mathematical Programming)

$\mathbf{Q} = \mathbf{Q} - \frac{\mathbf{Q\gamma \gamma^{T} Q}}{\mathbf{\gamma^{T}Q\gamma}} + \frac{\mathbf{\delta \delta^{T}}}{ \mathbf{\delta^{T} \gamma}}$


Q = Q - (Q @ gamma @ gamma.T @ Q)/(gamma.T @ Q @ gamma)\
  + (delta @ delta.T)/(delta.T @ gamma)

Q = Q - Q*γ*γ'*Q / (γ'*Q*γ) + δδ' / (δ'γ)

function euclid(α, β)
	return √(α^2 + β^2)
end

euclid(3, 4)
5

function 🦄(α, β)
	return √(α^2 + β^2)
end

🦄(3, 4)
5

# A bit off-topic ...
[Mojo]   hello.🔥

$$ \begin{aligned} & \max_{\ell, w} & & \ell w \\ & \text{subject to} & & 2w + 2\ell = L \end{aligned} $$

@variable(model, w  >= 0)
@variable(model, ℓ  >= 0)
@objective(model, Max, ℓ*w)
@constraint(model, 2w + 2ℓ == L)

Some examples of JuMP

The no so happy path

  • Formulation is not always possible
  • Unfeasibility
  • Intractability
  • Optimal … but not quite
    • Locally optimal
    • Nearly optimal
    • Optimal for a proxy

Sight Beyond Sight

Conclusions

Thank you!

Teaser

PPLs

Linear Regression $\to$ Least Squares [R1] [R2]

$$ \begin{align*} & \min_{\color{red}{\mathbf{\beta}}} & & \sum_{i=1} ^{n} \big(y_i - f(x_i, \mathbf{\beta})\big)^2 \end{align*} $$

In data veritas?

  • Expediency
  • Explainability
  • Transparency
  • Associations-only machine $\to$ causal analysis is not possible.
    • Examples:
      • Determining if global warming is due to human activity
      • Establishing the link between smoking and cancer

$$ \begin{align*} & \min_{\color{red}{\mathbf{\beta}}} & & \sum_{i=1} ^{n} \big(y_i - f(x_i, \mathbf{\beta})\big)^2 \end{align*} $$

$$ \begin{align*} \color{red}{\mathbf{\beta}} &= (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top}\mathbf{y} \end{align*} $$

$$ \begin{align*} \color{red}{\beta_0} &= 113.88 \\ \color{red}{\beta_1} &= 0.9 \\ \end{align*} $$

Gauss

Bayesian analysis

$$ \text{Posterior} = \frac{\text{Probability of the data} \times \text{Prior}}{\text{Average probability of the data}} $$

$$ \begin{align*} \beta_0 & \sim \text{Normal}(178, 20)\\ \beta_1 & \sim \text{Log-Normal}(0,1)\\ \sigma & \sim \text{Uniform}(0, 50)\\ \end{align*} $$

$$ \begin{align*} \mu_i & = \beta_0 + \beta_1 (x_i - \bar{x})\\ \text{height} & \sim \text{Normal} (\mu_i, \sigma)\\ \end{align*} $$

Distribution of $\color{red}{\beta_1}$

Probabilistic programming languages


def model(weight, height):
    b0 = numpyro.sample("b0", dist.Normal(178, 20))
    b1 = numpyro.sample("b1", dist.LogNormal(0, 1))
    sigma = numpyro.sample("sigma", dist.Uniform(0, 50))
    mu = b0 + b1 * (weight - jnp.mean(weight))
    numpyro.sample("height", dist.Normal(mu, sigma), obs=height)

Statistical Rethinking

Extra

Critical points, multiple minima

Constrained optimization $\to$ $\chi \in [a,b]$

$ \max \quad f(x) \equiv \min \quad -f(x)$