spark-bestfit
==============

Modern distribution fitting library with pluggable backends (Spark, Ray, Local).

Automatically fit ~90 scipy.stats continuous distributions and 16 discrete distributions
to your data using parallel processing. Supports Apache Spark for production clusters,
Ray for ML workflows, or local execution for development.

**Supported Versions:**

- Python 3.11 - 3.13
- Apache Spark 3.5.x and 4.x
- Ray 2.x (optional)
- See :doc:`quickstart` for the full compatibility matrix

Scope & Limitations
-------------------

spark-bestfit is designed for **batch processing** of statistical distribution fitting.

**What it does well:**

- Fit ~90 continuous and 16 discrete scipy.stats distributions in parallel
- Multi-column fitting: fit multiple columns efficiently in a single operation
- Provide robust goodness-of-fit metrics (KS, A-D, AIC, BIC, SSE)
- Generate publication-ready visualizations (histograms, Q-Q plots, P-P plots)
- Compute bootstrap confidence intervals for parameters
- Scale to 100M+ rows with Spark or Ray backends

**Known limitations:**

- No real-time/streaming support (batch processing only)
- Parameters and metrics use 32-bit floats (~7 significant digits) for Spark serialization
  efficiency. Very small values (e.g., p-values < 1e-7) may lose precision.

.. toctree::
   :maxdepth: 2
   :caption: Getting Started

   quickstart
   backends
   usecases

.. toctree::
   :maxdepth: 2
   :caption: Features

   features/config
   features/custom-distributions
   features/bounded
   features/sampling
   features/serialization
   features/copula
   features/multivariate
   features/mixture
   features/progress
   features/lazy-metrics
   features/prefiltering
   features/heavy-tail
   features/mse-estimation
   features/diagnostics-plots

.. toctree::
   :maxdepth: 2
   :caption: Reference

   architecture
   performance
   migration
   api
   faq
   glossary
   adr/index

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`