Flux Gap Filling Tool

Overview
--------
This tool provides high-quality gap filling for micrometeorological flux data
(e.g., ET, H, LE). It supports two widely used scientific methods:

1. MDS (Marginal Distribution Sampling):
   - Classical non-parametric gap-filling approach used in many flux networks.
   - Searches for donor values with similar environmental conditions at the
     same hour of day across seasonal windows (±7, 14, 30, 90 days).
   - Applies environmental similarity constraints (sr, vpd, at, rh, ws, rain).
   - Falls back to the mean diurnal cycle when no suitable matches are found.

2. RF (Random Forest–Based Gap Filling):
   - Trains a Random Forest using non-missing flux observations and
     environmental drivers.
   - Captures non-linear flux–meteorology relationships.
   - Enforces nighttime physical constraints (e.g., near-zero ET/LE/H at night).
   - Provides feature-importance diagnostics for interpretability.

Additionally, the tool includes a **Compare Models** mode:
   - Performs a masked-data experiment to evaluate MDS vs RF using MAE/RMSE.
   - Recommends the best-performing method per flux variable.
   - Allows variable-specific model selection for final gap filling.

The tool produces:
   - A gap-filled dataset,
   - Statistics summarizing filling performance,
   - Model comparison tables (when selected),
   - Interactive scientific plots.

Data Format
-----------
Upload a single Excel (.xlsx) file containing:

   Sheet1 – Flux Data:
      - Required: "date", "time"
      - Flux variables: ET, H, LE (only variables present will be processed)

   Sheet2 – Environmental Drivers:
      - Required: "date", "hour"
      - Recommended predictors:
           sr, vpd, at, rh, ws, rain

The tool merges both sheets into a unified timestamped dataset. Data spans
up to ~3 years can be processed in a single run. Larger datasets should be
split into smaller segments for optimal performance.

Gap Statistics
--------------
For each flux variable, the tool reports:

   - Original Gap (%):
       Fraction of time steps where the flux was missing before gap filling.

   - Gaps Filled (%):
       Percentage of the original missing values successfully filled by the
       selected method. Values close to 100% indicate a nearly complete,
       analysis-ready dataset.

Plot Endpoints
--------------
   - Time Series Plot:
       Shows the continuous record of original and gap-filled fluxes.
       Useful for assessing how well missing periods were reconstructed,
       detecting seasonal trends, and identifying unrealistic spikes or patterns.

   - Fingerprint Plot:
       A two-dimensional visualization (hour-of-day × day-of-year) revealing
       diurnal cycles, seasonal dynamics, and the effect of gap-filling.
       Highlights persistent gaps, transitional periods, and structural patterns
       in the flux time series.

   - RF Feature Importance Plot:
       Displays the relative influence of each environmental predictor used
       in the Random Forest model. Helps interpret which meteorological
       variables control ET/H/LE variability and supports scientific evaluation
       of model realism.


References
--------------
1. Reichstein, M., Falge, E., Baldocchi, D., Papale, D., Aubinet, M., Berbigier,
P., Bernhofer, C., Buchmann, N., Gilmanov, T., Granier, A. and Grünwald, T.,
2005. On the separation of net ecosystem exchange into assimilation and
ecosystem respiration: review and improved algorithm. Global change biology,
11(9), pp.1424-1439.

2. Papale, D., Reichstein, M., Aubinet, M., Canfora, E., Bernhofer, C.,
Kutsch, W., Longdoz, B., Rambal, S., Valentini, R., Vesala, T. and Yakir,
D., 2006. Towards a standardized processing of Net Ecosystem Exchange measured
with eddy covariance technique: algorithms and uncertainty estimation.
Biogeosciences, 3(4), pp.571-583.

3. Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32.

4. Tramontana, G., Jung, M., Schwalm, C.R., Ichii, K., Camps-Valls,
G., Ráduly, B., Reichstein, M., Arain, M.A., Cescatti, A., Kiely,
G. and Merbold, L., 2016. Predicting carbon dioxide and energy fluxes
across global FLUXNET sites with regression algorithms.
Biogeosciences, 13(14), pp.4291-4313.

Contact
-------
For questions, feedback, or suggestions:
   Dr. Srinivasa Rao Peddinti
   Department of Land, Air, and Water Resources
   University of California, Davis
   Email: speddinti@ucdavis.edu