Research Plan

Posted on Tue 27 May 2025 in Probability

emerson

Roadmap to Become a Machine Learning Researcher Focused on Embedded Optimal Transport

🧠 Phase 1: Mathematical and Theoretical Foundations

Goal: Build deep fluency in the theory that underpins optimal transport, geometric learning, and high-dimensional data analysis.

Core Topics

  • Measure theory & probability

  • “Probability with Martingales” (Williams)

  • “High-Dimensional Probability” (Vershynin)
  • Linear algebra & matrix analysis
  • Convex optimization

  • Boyd & Vandenberghe

  • Differential geometry & manifolds

  • “Manifold Learning” notes (Bronstein, Coifman)

  • Optimal transport theory

  • Villani's books: Topics in Optimal Transport, Optimal Transport: Old and New

  • Peyré & Cuturi (2019) Computational Optimal Transport

Deliverables

  • Annotated summaries of 3 key OT theorems (e.g. Kantorovich duality)
  • Personal reference cheatsheet of duality and convex geometry principles

🔧 Phase 2: Computational Tools for Modern ML

Goal: Build and automate pipelines for OT-based machine learning workflows.

Core Tooling

  • Python Ecosystem

  • NumPy, pandas, matplotlib, scikit-learn

  • PyTorch (+ Lightning), JAX
  • Optimal Transport Libraries

  • POT (Python Optimal Transport)

  • GeomLoss (for regularized losses on embedded manifolds)
  • OTT-JAX (scalable OT in JAX)

Engineering Stack

  • Git & GitHub workflows
  • Docker & FastAPI for model serving
  • MLflow or Weights & Biases for experiment tracking

Deliverables

  • Containerized OT pipeline on toy data
  • Comparative benchmark of Sinkhorn vs. exact OT on CIFAR embeddings

📊 Phase 3: Embedded OT & Manifold-Based Learning

Goal: Specialize in methods combining geometry, dimensionality reduction, and transport.

Topics & Papers to Reproduce

  • Wasserstein PCA

  • [Bigot et al., 2017] Wasserstein PCA of probability measures

  • OT on Manifolds

  • [Seguy et al., 2018] Large-scale OT using GANs

  • [Vayer et al., 2020] Fused Gromov-Wasserstein OT
  • Subspace-Embedded Transport

  • [Courty et al., 2016] OT for domain adaptation

  • [Bunne et al., 2022] Proximal Sinkhorn
  • JAX-based experiments

  • Implement sliced OT on PCA/UMAP embeddings

Deliverables

  • Notebook series on reproducibility of embedded OT methods
  • GitHub repo comparing sliced vs. Gromov-Wasserstein transport

📅 Phase 4: Research & Contribution

Goal: Develop novel use cases or extensions of embedded OT, focusing on interpretability, computational efficiency, or fairness.

Suggested Project Directions

  • OT-based fairness constraints for social science data
  • Transport-based metrics for representation drift in embeddings
  • Optimal transport in latent diffusion models

Publication & Sharing

  • Reproduce and extend 1 arXiv paper
  • Write a technical blog series on OT + geometry
  • Submit to:

  • NeurIPS Workshop on Optimal Transport

  • ICLR Workshop on Geometrical ML
  • SciML, Journal of Machine Learning Research (JMLR)

🌐 Phase 5: Networking and Visibility

Goal: Connect with researchers in OT + geometric ML communities

Key Twitter / GitHub Accounts

  • Gabriel Peyré (@GabrielPeyre)
  • Marco Cuturi (@mcuturi)
  • Justin Solomon (@solomonencoding)
  • Rémi Flamary (@remi_flamary)
  • Geoffrey Schiebinger (cell OT research)

Community Involvement

  • Comment on arXiv submissions via SciRate
  • Attend NeurIPS, ICML, ICLR, and Optimal Transport Seminar Series
  • Host discussions / reading groups on topical OT papers

💼 Bonus: Lightweight Tools for Prototyping

  • Streamlit for visualizing transport maps
  • Manim or matplotlib 3D for geodesic animations
  • JupyterLab + VSCode + SSH for dev workflow