Methods
Time Domain Reduction (TDR)
Rather than modeling and optimizing power grid operations at a high temporal resolution (e.g., hourly, over a full year) while evaluating new capacity investments, which can be computationally expensive for large-scale studies with several resources, it may be useful to consider a reduced temporal resolution to model annual grid operations. Such a time-domain reduction is often employed in capacity expansion models as a way to balance model spatial and temporal resolution as well as representation of dispatch, while ensuring reasonable computational times. The time-domain reduction method provided allows the user to automate these features while specifying the various parameters of the time-domain reduction 'clustering' algorithm to be used in formulating the resulting optimization model.
Running a case with Time Domain Reduction
There are two ways to run a case with a reduced (e.g. less than full-year) temporal resolution.
- Let GenX perform the time domain reduction before optimizing.
- Bring your own clustered data
It's also possible for GenX perform clustering separately from the optimization task.
Method 1: Let GenX perform the time domain reduction (clustering)
Set TimeDomainReduction: 1
in the GenX settings for the case.
When the case is run (but before the optimization model is built), reduced time series data will be output to a folder within the case, (typically) TDR_Results
. Note that if the data already exists in that folder, it will not be overwritten. If a user wants to change the time domain reduction settings and try again, the folder should be deleted before the case is run.
The clustering is done according to the settings in time_domain_reduction.yml
. These are described in the Inputs section of data_documentation.
Time domain clustering can only be performed on data which represents a single contiguous period: typically a year of 8760 or 8736 hours.
The header of the file Load_data.csv
in the main case folder will typically look like this:
..., Rep_Periods, Timesteps_per_Rep_Period, Sub_Weights, ...
1, 8760, 8760,
For an example that uses this method, see Example_Systems/RealSystemExample/ISONE_Singlezone
.
Method 2: Bring your own clustered data
The second method is to use an external program to generate the reduced ('clustered') time series data. For instance, PowerGenome has a capability to construct GenX cases with clustered time series data.
Running using this method requires setting TimeDomainReduction: 0
in the GenX settings for the case.
Clustered time series data requires specifying the clustering data using three columns in Load_data.csv
: Rep_Periods
, Timesteps_per_Rep_Period
, and Sub_Weights
. For example, a problem representing a full year via 3 representative weeks, and where the first week represents one which is twice as common as the others, would look like
..., Rep_Periods, Timesteps_per_Rep_Period, Sub_Weights, ...
3, 168, 4368.0,
2184.0,
2184.0,
In this example, the first week represents a total of 26*168 = 4368
hours over a full year.
The time series data are written in single unbroken columns: in this example, the Time_Index
ranges from 1 to 504.
For problems involving Long Duration Storage, a file Period_map.csv
is necessary to describe how these representative periods occur throughout the modeled year.
See also the Inputs section of data_documentation.
For an example that uses this method, see Example_Systems/RealSystemExample/ISONE_Trizone
.
Performing time domain reduction (TDR) separately from optimization
Added in 0.3.4
It may be useful to perform time domain reduction (TDR) (or "clustering") on a set of inputs before using them as part of full GenX optimization task. For example, a user might want to test various TDR settings and examine the resulting clustered inputs. This can now be performed using the run_timedomainreduction!
function.
> julia --project=/home/youruser/GenX
julia> using GenX
julia> run_timedomainreduction!("/path/to/case")
This function will obey the settings in path/to/case/Settings/time_domain_reduction_settings.yml
. It will output the resulting clustered time series files in the case.
Running this function will overwrite these files in the case. This is done with the expectation that the user is trying out various settings.
Developer's docs for internal functions related to time domain reduction
GenX.RemoveConstCols
— FunctionRemoveConstCols(all_profiles, all_col_names)
Remove and store the columns that do not vary during the period.
GenX.check_condition
— Methodcheck_condition(Threshold, R, OldColNames, ScalingMethod, TimestepsPerRepPeriod)
Check whether the greatest Euclidean deviation in the input data and the clustered representation is within a given proportion of the "maximum" possible deviation.
(1 for Normalization covers 100%, 4 for Standardization covers ~95%)
GenX.cluster
— Functioncluster(ClusterMethod, ClusteringInputDF, NClusters, nIters)
Get representative periods using cluster centers from kmeans or kmedoids.
K-Means: https://juliastats.org/Clustering.jl/dev/kmeans.html
K-Medoids: https://juliastats.org/Clustering.jl/stable/kmedoids.html
GenX.cluster_inputs
— Functioncluster_inputs(inpath, settings_path, v=false, norm_plot=false, silh_plot=false, res_plots=false, indiv_plots=false, pair_plots=false)
Use kmeans or kmedoids to cluster raw load profiles and resource capacity factor profiles into representative periods. Use Extreme Periods to capture noteworthy periods or periods with notably poor fits.
In Load_data.csv, include the following:
- Timesteps_per_Rep_Period - Typically 168 timesteps (e.g., hours) per period, this designates the length of each representative period.
- UseExtremePeriods - Either 1 or 0, this designates whether or not to include outliers (by performance or load/resource extreme) as their own representative periods. This setting automatically includes the periods with maximum load, minimum solar cf and minimum wind cf as extreme periods.
- ClusterMethod - Either 'kmeans' or 'kmedoids', this designates the method used to cluster periods and determine each point's representative period.
- ScalingMethod - Either 'N' or 'S', this designates directs the module to normalize ([0,1]) or standardize (mean 0, variance 1) the input data.
- MinPeriods - The minimum number of periods used to represent the input data. If using UseExtremePeriods, this must be at least three. If IterativelyAddPeriods if off, this will be the total number of periods.
- MaxPeriods - The maximum number of periods - both clustered periods and extreme periods - that may be used to represent the input data.
- IterativelyAddPeriods - Either 1 or 0, this designates whether or not to add periods until the error threshold between input data and represented data is met or the maximum number of periods is reached.
- Threshold - Iterative period addition will end if the period farthest (Euclidean Distance) from its representative period is within this percentage of the total possible error (for normalization) or ~95% of the total possible error (for standardization). E.g., for a threshold of 0.01, every period must be within 1% of the spread of possible error before the clustering iterations will terminate (or until the max number of periods is reached).
- IterateMethod - Either 'cluster' or 'extreme', this designates whether to add clusters to the kmeans/kmedoids method or to set aside the worst-fitting periods as a new extreme periods.
- nReps - The number of times to repeat each kmeans/kmedoids clustering at the same setting.
- LoadWeight - Default 1, this is an optional multiplier on load columns in order to prioritize better fits for load profiles over resource capacity factor profiles.
- WeightTotal - Default 8760, the sum to which the relative weights of representative periods will be scaled.
- ClusterFuelPrices - Either 1 or 0, this indicates whether or not to use the fuel price time series in Fuels_data.csv in the clustering process. If 'no', this function will still write Fuels_data_clustered.csv with reshaped fuel prices based on the number and size of the representative weeks, assuming a constant time series of fuel prices with length equal to the number of timesteps in the raw input data.
- MultiStageConcatenate - (Only considered if MultiStage = 1 in genx_settings.yml) If 1, this designates that the model should time domain reduce the input data of all model stages together. Else if 0, [still in development] the model will time domain reduce only the first stage and will apply the periods of each other model stage to this set of representative periods by closest Eucliden distance.
GenX.get_absolute_extreme
— Methodget_absolute_extreme(DF, statKey, col_names, ConstCols)
Get the period index of the single timestep with the minimum or maximum load or capacity factor.
GenX.get_extreme_period
— Functionget_extreme_period(DF, GDF, profKey, typeKey, statKey,
ConstCols, load_col_names, solar_col_names, wind_col_names)
Identify extreme week by specification of profile type (Load, PV, Wind), measurement type (absolute (timestep with min/max value) vs. integral (period with min/max summed value)), and statistic (minimum or maximum). I.e., the user could want the hour with the most load across the whole system to be included among the extreme periods. They would select "Load", "System, "Absolute, and "Max".
GenX.get_integral_extreme
— Methodget_integral_extreme(GDF, statKey, col_names, ConstCols)
Get the period index with the minimum or maximum load or capacity factor summed over the period.
GenX.get_load_multipliers
— Functionget_load_multipliers(ClusterOutputData, ModifiedData, M, W, LoadCols, TimestepsPerRepPeriod, NewColNames, NClusters, Ncols)
Get multipliers to linearly scale clustered load profiles L zone-wise such that their weighted sum equals the original zonal total load. Scale load profiles later using these multipliers in order to ensure that a copy of the original load is kept for validation.
Find $k_z$ such that:
\[\sum_{i \in I} L_{i,z} = \sum_{t \in T, m \in M} C_{t,m,z} \cdot \frac{w_m}{T} \cdot k_z \: \: \: \forall z \in Z\]
where $Z$ is the set of zones, $I$ is the full time domain, $T$ is the length of one period (e.g., 168 for one week in hours), $M$ is the set of representative periods, $L_{i,z}$ is the original zonal load profile over time (hour) index $i$, $C_{i,m,z}$ is the load in timestep $i$ for representative period $m$ in zone $z$, $w_m$ is the weight of the representative period equal to the total number of hours that one hour in representative period $m$ represents in the original profile, and $k_z$ is the zonal load multiplier returned by the function.
GenX.get_worst_period_idx
— Methodget_worst_period_idx(R)
Get the index of the period that is farthest from its representative period by Euclidean distance.
GenX.parse_data
— Methodparse_data(myinputs)
Get load, solar, wind, and other curves from the input data.
GenX.parse_multi_stage_data
— Methodparse_mutli_period_data(inputs_dict)
Get load, solar, wind, and other curves from multi-stage input data.
GenX.rmse_score
— Methodrmse_score(y_true, y_pred)
Calculates Root Mean Square Error.
\[RMSE = \sqrt{\frac{1}{n}\Sigma_{i=1}^{n}{\Big(\frac{d_i -f_i}{\sigma_i}\Big)^2}}\]
GenX.scale_weights
— Functionscale_weights(W, H)
Linearly scale weights W such that they sum to the desired number of timesteps (hours) H.
\[w_j \leftarrow H \cdot \frac{w_j}{\sum_i w_i} \: \: \: \forall w_j \in W\]
Multi-Stage Modeling
GenX can be configured for multi-stage modeling with perfect foresight. The dual dynamic program (DDP) algorithm is a well-known approach for solving multi-stage optimization problems in a computationally efficient manner, first proposed by Pereira and Pinto (1991). This algorithm splits up a multi-stage investment planning problem into multiple, single-period sub-problems. Each period is solved iteratively as a separate linear program sub-problem (“forward pass”), and information from future periods is shared with past periods (“backwards pass”) so that investment decisions made in subsequent iterations reflect the contributions of present-day investments to future costs. Multi-period modeling functionality is designed as a "wrapper" around GenX, and to the extent possible, existing methods were left unchanged.
The time-domain reduction method provided allows the user to automate these feature by specifying the various parameters related to the time-domain reduction algorithm (via time_domain_reduction_settings.yml described under Model Inputs/Outputs documentations/Inputs), including the desired level of temporal resolution to be used in formulating the resulting optimization model.