Prepare Kaggle submissions¶
For the Rocky Worlds Director’s Discretionary Time Data Challenge.
Outline¶
Format posterior samples for submission
Format photometry for submission
Format the forms for submission
Combine the above into a single ZIP
In this example, we’ll draw random numbers to submit as our “results” for many fields. You should replace these with the real thing.
import numpy as np
rng = np.random.default_rng(0)
Components of a valid submission¶
The package defines Python objects for each file that makes up a submission. The objects are:
Posterior: contains posterior samples.Photometry: contains the reduced time series flux, astrophysical model, noise model, and “full model” (astrophysical + noise), with optional additional time-series products such as target centroids or background.Form: lists a series of questions about your reduction and analysis choices that must be answered for each target, in each submission.Results: combines the above components and writes a ZIP archive that’s ready to submit to Kaggle.
import rocky_worlds_data_challenge as rw
1. Posterior samples¶
In the cell below we create some artificial samples.
# generate fake posterior samples:
n_parameters = 7
n_posterior_samples = 10_000
samples_shape = (n_parameters, n_posterior_samples)
samples_GJ_3929_b = rng.normal(1000, 10, size=samples_shape)
samples_LHS_1140_b = rng.normal(1000, 10, size=samples_shape)
samples_GJ_3929_b
If your sampler produces weighted samples, Posterior should contain
equal-weight samples.
Now we write out parameter_keys, defining names for each of the sampling
parameters in the posterior samples array. The length of parameter_keys
must match the first dimension of the samples array above. A plain Python
list of strings is shown here, but NumPy arrays, pandas Index objects, tuples,
and other array-like containers of parameter names are also accepted; the
Posterior object will convert them to strings internally before writing the
submission files.
The expected sample shape is (n_parameters, n_posterior_samples). If your
sampler returns samples in the opposite orientation,
(n_posterior_samples, n_parameters), transpose the array before creating
the Posterior object. For example, use samples = samples.T.
# Use one parameter name for each row of the samples array.
# Lists, tuples, NumPy arrays, and pandas Index objects are all accepted.
parameter_keys = [
'depth_ecl',
't_ecl',
'b_ecl',
'per',
'ecosw',
'esinw',
'non-standard key',
]
posterior_GJ_3929_b = rw.Posterior(
samples=samples_GJ_3929_b,
parameter_keys=parameter_keys,
)
posterior_LHS_1140_b = rw.Posterior(
samples=samples_LHS_1140_b,
parameter_keys=parameter_keys,
)
The Posterior object will validate your inputs to make sure the dimensions
match expectations. If the inputs fail validation, Posterior will raise an
error.
What Parameters And Standard parameter_keys Are Supported?¶
The required posterior parameter is depth_ecl, the eclipse depth in parts
per million (ppm). The submission tools also understand the standard parameter
names defined by rw.EclipsingSystem, listed below.
If your analysis allowed eclipse depths to vary between eclipses, or if you
fit each eclipse observation separately, please combine those results into one
representative posterior for the average eclipse depth before submission. The
exact averaging procedure is up to your team, but the submitted
depth_ecl samples should describe the single average eclipse depth that
you want evaluated.
If your sampler uses different names, simply rename your posterior columns or
the entries in parameter_keys before creating the Posterior object. For
example, if your samples call the eclipse depth fpfs_ppm, use
depth_ecl in parameter_keys for that row.
Commonly useful standard keys include:
Key |
Meaning |
|---|---|
|
Eclipse depth [ppm]; required for grading. |
|
Mid-eclipse time [BMJD_TDB]. |
|
Mid-transit time [BMJD_TDB]. |
|
Orbital period [days]. |
|
Planet-to-star radius ratio. |
|
Fractional transit depth; converted internally to
|
|
Semimajor axis in units of stellar radii ( |
|
Orbital inclination [deg]. |
|
Transit impact parameter in units of stellar radii. |
|
Eclipse impact parameter in units of stellar radii. |
|
Orbital eccentricity and longitude of periastron [deg]. |
|
Eccentricity times the cosine/sine of longitude of periastron. |
|
Square root of eccentricity times the cosine/sine of longitude of periastron. |
|
Stellar density in units of solar density. |
|
Transit duration [days]. |
|
Eclipse duration [days]. |
You may include additional non-standard keys in parameter_keys; they will
be preserved in the posterior files.
The EclipsingSystem helper can also convert between several equivalent
parameterizations when enough information is provided. For example, it can
derive inclination (inc) from an impact parameter such as b_tra when it
also has the scaled semimajor axis a and the eccentricity/orientation
information, derive ecc and omega from ecosw/esinw or
secosw/sesinw, compute transit and eclipse impact parameters from
a and inc, and use duration plus impact parameter to recover a and
inc. These conversions are intended to make common posterior
parameterizations easier to compare, but your parameter_keys should still
use the standard names above so the submission tools know what each row
represents.
# Programmatic list of standard EclipsingSystem parameter keys:
list(rw.EclipsingSystem.__dataclass_fields__)
2. Photometry¶
Each submission should include a Photometry object for each target. This
is where you provide the light curve products from your own reduction and
modeling: the time array, reduced flux and flux uncertainty, astrophysical
model, noise/systematics model, and the combined full model.
The arrays in a Photometry object should all describe the same time series
and therefore must have the same length. The example below uses placeholder
arrays so the notebook can run end-to-end; for a real submission, replace
these arrays with the values produced by your analysis. This object is the
container that writes those photometric products into the submission ZIP.
You may also include additional, non-mandatory time-series products as extra
keyword arguments. For example, if your analysis tracks the subtracted
sky/background level per integration, you can pass background=.... If you
measured target centroids or PSF widths, you can include
centroid_x=..., centroid_y=..., centroid_sx=..., and
centroid_sy=.... These optional arrays will be written as additional
datasets in the photometry HDF5 file. You are encouraged to add as many
additional time-series products as you feel are necessary to reproduce your
results.
# number of samples in the photometric time series:
n_time_series = 1500
phot_shape = (n_time_series, )
fake_times = np.linspace(0, 1, n_time_series)
fake_time_series = np.ones(phot_shape)
photometry_GJ_3929_b = rw.Photometry(
# required
time=fake_times,
raw_flux=fake_time_series,
raw_flux_err=fake_time_series,
astro_model=fake_time_series,
noise_model=fake_time_series,
full_model=fake_time_series,
# optional, additional time-series products
# examples: target centroids, PSF widths, and background level
centroid_x=fake_time_series,
centroid_y=fake_time_series,
centroid_sx=fake_time_series,
centroid_sy=fake_time_series,
background=fake_time_series,
)
photometry_LHS_1140_b = rw.Photometry(
# required
time=fake_times,
raw_flux=fake_time_series,
raw_flux_err=fake_time_series,
astro_model=fake_time_series,
noise_model=fake_time_series,
full_model=fake_time_series,
# optional, additional time-series products
# examples: target centroids, PSF widths, and background level
centroid_x=fake_time_series,
centroid_y=fake_time_series,
centroid_sx=fake_time_series,
centroid_sy=fake_time_series,
background=fake_time_series,
)
3. Forms¶
Each Kaggle submission must include two completed forms, one for the analysis of GJ 3929 b, and another for LHS 1140 b. The forms ask questions about your team, data reduction process, assumptions, and analysis.
The blank form template is packaged with rocky_worlds_data_challenge. The
recommended way to create a local editable copy is to load that packaged
template with Form.blank() and save it:
import rocky_worlds_data_challenge as rw
form = rw.Form.blank()
form.save("form_GJ3929b.json", overwrite=True, validate=False)
There are a few ways to fill in a form:
Load and save a blank form in Python with
Form.blank().Load an existing form in Python with
Form(path='/path/to/form.json').Interactively create a new form or edit an existing one with
interactive_form().
Load a blank form in Python with Form.blank()¶
form = Form.blank() creates a blank form to fill in. Each question is an
entry in form.dictionary.
You can fill in the form programmatically in Python.
form = rw.Form.blank()
# Each entry has a number as its key in form.dictionary,
# where the question number is formatted as a zero-padded string.
# Each entry is also a dictionary, containing several components:
# prompt, description, required, format, example, response.
form.dictionary['01']['response'] = "GJ 3929 b"
prompt = form.dictionary['01']['prompt']
response = form.dictionary['01']['response']
print(f"{prompt}: {response}")
Load an existing form in Python¶
If you’d like to start by modifying an existing form on your machine, use:
import rocky_worlds_data_challenge as rw
form = rw.Form(path='/path/to/form.json')
Interactively create and edit forms in the Jupyter notebook¶
Run the following to open an interactive widget in this Jupyter notebook for creating, editing, validating, and saving these JSON forms. The widget will appear below the executed cell.
The form will try to follow your notebook theme automatically. If your editor
renders the form with poor contrast, you can force a theme with
rw.interactive_form(theme='dark') or
rw.interactive_form(theme='light').
rw.interactive_form()
4. Results¶
All results are combined into a single object which can write out your submission into a ZIP archive.
results = rw.Results(
posterior_GJ_3929_b=posterior_GJ_3929_b,
photometry_GJ_3929_b=photometry_GJ_3929_b,
form_GJ_3929_b=rw.Form(path='/path/to/form_GJ3929b.json'),
posterior_LHS_1140_b=posterior_LHS_1140_b,
photometry_LHS_1140_b=photometry_LHS_1140_b,
form_LHS_1140_b=rw.Form(path='/path/to/form_LHS1140b.json'),
)
# write `submission.zip` to the same directory as this notebook:
results.to_submission('submission.zip', overwrite=True)