Prepare Kaggle submissions
==========================

For the Rocky Worlds Director's Discretionary Time Data Challenge.

Outline
-------

1. Format *posterior samples* for submission
2. Format *photometry* for submission
3. Format the *forms* for submission
4. Combine the above into a single ZIP

In this example, we'll draw random numbers to submit as our "results" for many
fields. You should replace these with the real thing.

.. code-block:: python

    import numpy as np

    rng = np.random.default_rng(0)

Components of a valid submission
--------------------------------

The package defines Python objects for each file that makes up a submission.
The objects are:

1. ``Posterior``: contains posterior samples.
2. ``Photometry``: contains the reduced time series flux, astrophysical model,
   noise model, and "full model" (astrophysical + noise), with optional
   additional time-series products such as target centroids or background.
3. ``Form``: lists a series of questions about your reduction and analysis
   choices that must be answered for each target, in each submission.
4. ``Results``: combines the above components and writes a ZIP archive that's
   ready to submit to Kaggle.

.. code-block:: python

    import rocky_worlds_data_challenge as rw

1. Posterior samples
--------------------

In the cell below we create some artificial samples.

.. code-block:: python

    # generate fake posterior samples:
    n_parameters = 7
    n_posterior_samples = 10_000

    samples_shape = (n_parameters, n_posterior_samples)
    samples_GJ_3929_b = rng.normal(1000, 10, size=samples_shape)
    samples_LHS_1140_b = rng.normal(1000, 10, size=samples_shape)
    samples_GJ_3929_b

If your sampler produces weighted samples, ``Posterior`` should contain
equal-weight samples.

Now we write out ``parameter_keys``, defining names for each of the sampling
parameters in the posterior samples array. The length of ``parameter_keys``
must match the first dimension of the ``samples`` array above. A plain Python
list of strings is shown here, but NumPy arrays, pandas Index objects, tuples,
and other array-like containers of parameter names are also accepted; the
``Posterior`` object will convert them to strings internally before writing the
submission files.

The expected sample shape is ``(n_parameters, n_posterior_samples)``. If your
sampler returns samples in the opposite orientation,
``(n_posterior_samples, n_parameters)``, transpose the array before creating
the ``Posterior`` object. For example, use ``samples = samples.T``.

.. code-block:: python

    # Use one parameter name for each row of the samples array.
    # Lists, tuples, NumPy arrays, and pandas Index objects are all accepted.
    parameter_keys = [
        'depth_ecl',
        't_ecl',
        'b_ecl',
        'per',
        'ecosw',
        'esinw',
        'non-standard key',
    ]

    posterior_GJ_3929_b = rw.Posterior(
        samples=samples_GJ_3929_b,
        parameter_keys=parameter_keys,
    )
    posterior_LHS_1140_b = rw.Posterior(
        samples=samples_LHS_1140_b,
        parameter_keys=parameter_keys,
    )

The ``Posterior`` object will validate your inputs to make sure the dimensions
match expectations. If the inputs fail validation, ``Posterior`` will raise an
error.

What Parameters And Standard ``parameter_keys`` Are Supported?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The required posterior parameter is ``depth_ecl``, the eclipse depth in parts
per million (ppm). The submission tools also understand the standard parameter
names defined by ``rw.EclipsingSystem``, listed below.

If your analysis allowed eclipse depths to vary between eclipses, or if you
fit each eclipse observation separately, please combine those results into one
representative posterior for the average eclipse depth before submission. The
exact averaging procedure is up to your team, but the submitted
``depth_ecl`` samples should describe the single average eclipse depth that
you want evaluated.

If your sampler uses different names, simply rename your posterior columns or
the entries in ``parameter_keys`` before creating the ``Posterior`` object. For
example, if your samples call the eclipse depth ``fpfs_ppm``, use
``depth_ecl`` in ``parameter_keys`` for that row.

Commonly useful standard keys include:

======================= =====================================================
Key                     Meaning
======================= =====================================================
``depth_ecl``           Eclipse depth [ppm]; required for grading.
``t_ecl``               Mid-eclipse time [BMJD_TDB].
``t_tra``               Mid-transit time [BMJD_TDB].
``per``                 Orbital period [days].
``rprs``                Planet-to-star radius ratio.
``depth_tra``           Fractional transit depth; converted internally to
                        ``rprs = sqrt(depth_tra)``.
``a``                   Semimajor axis in units of stellar radii (``a/Rs``).
``inc``                 Orbital inclination [deg].
``b_tra``               Transit impact parameter in units of stellar radii.
``b_ecl``               Eclipse impact parameter in units of stellar radii.
``ecc``, ``omega``      Orbital eccentricity and longitude of periastron
                        [deg].
``ecosw``, ``esinw``    Eccentricity times the cosine/sine of longitude of
                        periastron.
``secosw``, ``sesinw``  Square root of eccentricity times the cosine/sine of
                        longitude of periastron.
``rho_star``            Stellar density in units of solar density.
``dur_tra``             Transit duration [days].
``dur_ecl``             Eclipse duration [days].
======================= =====================================================

You may include additional non-standard keys in ``parameter_keys``; they will
be preserved in the posterior files.

The ``EclipsingSystem`` helper can also convert between several equivalent
parameterizations when enough information is provided. For example, it can
derive inclination (``inc``) from an impact parameter such as ``b_tra`` when it
also has the scaled semimajor axis ``a`` and the eccentricity/orientation
information, derive ``ecc`` and ``omega`` from ``ecosw``/``esinw`` or
``secosw``/``sesinw``, compute transit and eclipse impact parameters from
``a`` and ``inc``, and use duration plus impact parameter to recover ``a`` and
``inc``. These conversions are intended to make common posterior
parameterizations easier to compare, but your ``parameter_keys`` should still
use the standard names above so the submission tools know what each row
represents.

.. code-block:: python

    # Programmatic list of standard EclipsingSystem parameter keys:
    list(rw.EclipsingSystem.__dataclass_fields__)

2. Photometry
-------------

Each submission should include a ``Photometry`` object for each target. This
is where you provide the light curve products from your own reduction and
modeling: the time array, reduced flux and flux uncertainty, astrophysical
model, noise/systematics model, and the combined full model.

The arrays in a ``Photometry`` object should all describe the same time series
and therefore must have the same length. The example below uses placeholder
arrays so the notebook can run end-to-end; for a real submission, replace
these arrays with the values produced by your analysis. This object is the
container that writes those photometric products into the submission ZIP.

You may also include additional, non-mandatory time-series products as extra
keyword arguments. For example, if your analysis tracks the subtracted
sky/background level per integration, you can pass ``background=...``. If you
measured target centroids or PSF widths, you can include
``centroid_x=...``, ``centroid_y=...``, ``centroid_sx=...``, and
``centroid_sy=...``. These optional arrays will be written as additional
datasets in the photometry HDF5 file. You are encouraged to add as many
additional time-series products as you feel are necessary to reproduce your
results.

.. code-block:: python

    # number of samples in the photometric time series:
    n_time_series = 1500
    phot_shape = (n_time_series, )

    fake_times = np.linspace(0, 1, n_time_series)
    fake_time_series = np.ones(phot_shape)

    photometry_GJ_3929_b = rw.Photometry(
        # required
        time=fake_times,
        raw_flux=fake_time_series,
        raw_flux_err=fake_time_series,
        astro_model=fake_time_series,
        noise_model=fake_time_series,
        full_model=fake_time_series,

        # optional, additional time-series products
        # examples: target centroids, PSF widths, and background level
        centroid_x=fake_time_series,
        centroid_y=fake_time_series,
        centroid_sx=fake_time_series,
        centroid_sy=fake_time_series,
        background=fake_time_series,
    )

    photometry_LHS_1140_b = rw.Photometry(
        # required
        time=fake_times,
        raw_flux=fake_time_series,
        raw_flux_err=fake_time_series,
        astro_model=fake_time_series,
        noise_model=fake_time_series,
        full_model=fake_time_series,

        # optional, additional time-series products
        # examples: target centroids, PSF widths, and background level
        centroid_x=fake_time_series,
        centroid_y=fake_time_series,
        centroid_sx=fake_time_series,
        centroid_sy=fake_time_series,
        background=fake_time_series,
    )

3. Forms
--------

Each Kaggle submission must include two completed forms, one for the analysis
of GJ 3929 b, and another for LHS 1140 b. The forms ask questions about your
team, data reduction process, assumptions, and analysis.

The blank form template is packaged with ``rocky_worlds_data_challenge``. The
recommended way to create a local editable copy is to load that packaged
template with ``Form.blank()`` and save it:

.. code-block:: python

    import rocky_worlds_data_challenge as rw

    form = rw.Form.blank()
    form.save("form_GJ3929b.json", overwrite=True, validate=False)

There are a few ways to fill in a form:

* Load and save a blank form in Python with ``Form.blank()``.
* Load an existing form in Python with ``Form(path='/path/to/form.json')``.
* Interactively create a new form or edit an existing one with
  ``interactive_form()``.

Load a blank form in Python with ``Form.blank()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``form = Form.blank()`` creates a blank form to fill in. Each question is an
entry in ``form.dictionary``.

You can fill in the form programmatically in Python.

.. code-block:: python

    form = rw.Form.blank()

    # Each entry has a number as its key in form.dictionary,
    # where the question number is formatted as a zero-padded string.

    # Each entry is also a dictionary, containing several components:
    # prompt, description, required, format, example, response.

    form.dictionary['01']['response'] = "GJ 3929 b"

    prompt = form.dictionary['01']['prompt']
    response = form.dictionary['01']['response']

    print(f"{prompt}: {response}")

Load an existing form in Python
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you'd like to start by modifying an existing form on your machine, use:

.. code-block:: python

    import rocky_worlds_data_challenge as rw

    form = rw.Form(path='/path/to/form.json')

Interactively create and edit forms in the Jupyter notebook
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run the following to open an interactive widget in this Jupyter notebook for
creating, editing, validating, and saving these JSON forms. The widget will
appear below the executed cell.

The form will try to follow your notebook theme automatically. If your editor
renders the form with poor contrast, you can force a theme with
``rw.interactive_form(theme='dark')`` or
``rw.interactive_form(theme='light')``.

.. code-block:: python

    rw.interactive_form()

4. Results
----------

All results are combined into a single object which can write out your
submission into a ZIP archive.

.. code-block:: python

    results = rw.Results(
        posterior_GJ_3929_b=posterior_GJ_3929_b,
        photometry_GJ_3929_b=photometry_GJ_3929_b,
        form_GJ_3929_b=rw.Form(path='/path/to/form_GJ3929b.json'),
        posterior_LHS_1140_b=posterior_LHS_1140_b,
        photometry_LHS_1140_b=photometry_LHS_1140_b,
        form_LHS_1140_b=rw.Form(path='/path/to/form_LHS1140b.json'),
    )

    # write `submission.zip` to the same directory as this notebook:
    results.to_submission('submission.zip', overwrite=True)