Company Ingrid Solberg

What We Learned from Our First Six Pilot Deployments

A candid retrospective on what worked and what we had to rebuild after our first six customer pilots: integration surprises, asset data quality gaps, and the calibration lessons that shaped our current onboarding process.

What We Learned from Our First Six Pilot Deployments

We started taking pilot customers in early 2023. By the end of that year, we had completed six deployments across four states — three solar installations, two wind, and one mixed portfolio. Each was a 30-day pilot against actual SCADA data, with the objective of delivering calibrated P10/P50/P90 forecasts integrated into the customer's existing data infrastructure before any commercial discussion.

What we expected going in, and what we found, diverged in specific ways. This is a candid account of those differences — not a polished success story, but the actual lessons that changed how we build the platform.

The asset data was almost never what we were told it would be

Every pilot kickoff conversation included a data inventory: GPS coordinates, panel specifications or turbine model, tilt and azimuth angles, the EMS or historian system in use. In five of the six pilots, at least one critical parameter in that inventory was wrong, missing, or out of date by the time we connected to the actual system.

The most common issue was tilt/azimuth discrepancy. A utility-scale solar site with 45,000 panels does not have a single tilt angle — trackers have varying degradation rates and stow modes, and the as-built configuration sometimes differs from the as-designed spec that procurement teams hand over. In one pilot at a 75 MW fixed-tilt site in southern Colorado, the as-built tilt angles for two of seven inverter blocks differed by 4–6° from the records we received. That discrepancy was small enough to be invisible in the nameplate spec but created a systematic morning-period GHI-to-DC conversion error that pushed our first week of forecasts 3–4% high before we corrected it against SCADA actuals.

The lesson: we now require as-built plant documentation reviewed against at least 30 days of SCADA output before signing off on initial forecast parameters. Generic NWP-to-power-curve conversion on nominal specs is a starting point, not an operational input.

Historian connectivity was harder than the vendors said it would be

Three pilots involved integrating forecast delivery into an existing time-series historian — two OSIsoft Pi installations and one AVEVA System Platform. The integration documentation for both platforms is thorough. What the documentation does not cover is the gap between a test environment and the production historian with years of accumulated tag naming conventions, access control configurations, and IT network segmentation decisions.

In one case, the production Pi Server was on a network segment physically separated from the corporate IT zone where our API connector was expected to run, and the network team's rule set had never anticipated a third-party vendor needing write access from outside that segment. Getting the right firewall exceptions approved took 11 days — nearly half the pilot window. We ran the pilot in parallel via CSV export to a shared drive as a fallback, which worked but was not the integration experience either party wanted.

We've since built our integration process to include a network access checklist in the pre-pilot scoping call, and we treat historian connectivity as a critical path item with an explicit timeline dependency rather than an assumed-solved technical detail.

SCADA data quality was the forecast accuracy bottleneck, not the NWP layer

Before the first pilots, our internal accuracy assumption was that the primary driver of forecast error would be NWP model skill — GFS or ECMWF representing cloud systems accurately enough for the bias correction layer to do its job. The reality was that for the first 4–6 weeks of each pilot, SCADA data quality issues in the historical training set were a larger driver of ML calibration error than NWP uncertainty.

The most operationally impactful issue was stuck sensor readings — inverter output channels that reported a constant value for multi-hour periods before resetting, likely due to communication timeouts between field devices and the historian. These stuck values look superficially like real production data unless you apply rate-of-change outlier detection, but they teach the correction model the wrong relationship between NWP input and observed output during those intervals.

We now run an automated data quality scan on every SCADA actuals feed before adding it to the training window. The scan checks for: stuck values, implausible ramp rates (more than 20% nameplate per 5-minute interval without a corresponding irradiance change), negative output values (inverter communication artifacts), and timestamp gaps longer than two consecutive intervals. Flagged intervals are excluded from ML training and filled with NWP-direct estimates for that window. The result is a cleaner training set but one that requires approximately 30% more calendar days to accumulate the same effective sample size we assumed from nominal 5-minute interval history.

Grid operators wanted confidence bands, not point forecasts — but not all the same bands

We had designed the initial pilot output format around P10/P50/P90 bands delivered at 15-minute resolution, which is what the published academic literature on probabilistic solar forecasting treats as the standard. What we discovered across the six pilots was that different operators had different mental models of what to do with those three numbers, and the mapping from three quantiles to an operational decision varied significantly.

Two of our wind customers wanted a single "commit number" derived from the probabilistic output, not the bands themselves — they needed something their EMS could ingest as a deterministic schedule without requiring the dispatch engineer to interpret spread width in real time. Two of the solar customers specifically wanted wider tails: P5 and P95 in addition to the inner bands, because their reserve commitment framework used the outer quantiles to trigger additional reserve procurement actions.

One customer — a mixed solar/wind portfolio manager — wanted the probability of production falling below a specific absolute MW threshold, which is a conditional probability calculation rather than a fixed quantile output. That required a small but real change to our output API to support threshold queries against the distribution, not just fixed percentile reads.

The pilot process revealed that "probabilistic forecast output" is not a single product. It is a distribution, and different customers need different slices of that distribution depending on their decision workflow. We now explicitly scope output format requirements in the pre-pilot technical call and design the API response envelope to be flexible enough to support custom quantile requests within a bounded set of options.

The 30-day pilot is the right length — for reasons we didn't initially anticipate

We chose 30 days as the pilot duration primarily for commercial reasons: it is long enough to demonstrate meaningful forecast skill but short enough that a customer can commit to the pilot without a board-level procurement process. What we found is that 30 days also turns out to be about the minimum period needed for the ML bias correction model to reach stable performance, assuming reasonably clean SCADA data.

In the first two weeks of each pilot, the correction model is training primarily on the most recent actuals and the forecast skill improvement over raw NWP is real but variable — the model has seen too few cloud regime transitions to generalize well to unseen conditions. By week three to four, with 2,000+ 15-minute training intervals, the model has seen enough diurnal and synoptic variation to produce correction weights that hold up on hold-out windows. The accuracy curves at the end of a 30-day pilot consistently show the largest improvement in the final week compared to the first, which is a sign the model is still learning.

This means we push back on pilot requests for a two-week evaluation window. Two weeks is enough to show that the integration works and the API is functional. It is not enough to show what the forecast accuracy looks like at production maturity. An operator making a license decision based on a two-week MAE figure is making it on data from an incompletely trained model.

None of these lessons are surprising in hindsight. Data integration is hard. SCADA data is messy. Customer workflows don't match vendor assumptions. What changed is that each lesson is now a specific process checkpoint in the pre-pilot scoping, not a discovery that happens during the pilot itself. The pilots run smoother, but more importantly, they now produce accuracy results that reflect the system performing at its actual capability rather than the capability limited by avoidable data quality and integration friction.

More from the blog