Historically each and every LPAR was assigned a Relative Weight value; where a more meaningful description would be the initial processing weight. This relative weight value is used to determine which LPAR gains access to resources, where multiple LPARs are competing for the same resource. Being unit-less is one minor challenge of the relative weight value, meaning that it has no explicit CPU capacity or resource value. Typically installations would use a simple multiple of ten metric, most likely 1000, and allocate weights accordingly (E.g. 600=60%, 300=30%, 10=10%, et al). Therefore during periods of resource contention, PR/SM would allocate resources to the requisite LPAR, based upon its relative weight.
Using relative weight to classify all LPARs as equal, at least from a generic class viewpoint, does have some considerations; primarily differentiating between Production and Non-Production workloads. Restricting a workload to its relative weight share of resources is known as Hard Capping. This setting is typically used to restrict Non-Production (E.g. Test) environments to their allocated resource and is also useful for cost control (E.g. Outsourcers), knowing that the LPAR will never consume more than its allocated relative weight allowance.
Hard Capping behaviour changes dependent on the use of the HiperDispatch setting. When HiperDispatch is not chosen, capping is performed at the Logical CP level, where the goal is for each logical CP to receive its relative CP share, based on the relative weight setting. When HiperDispatch is active, vertical as opposed to horizontal CPU management applies. So, a High categorization dictates capping at 100% of the logical CP, whereas a Medium or Low setting allows for resource sharing based on a relative weight per CP basis.
The Intelligent Resource Director (IRD) function provides more advanced relative weight management, automating management of CPU resources and a subset of I/O resources. Workload Manager (WLM) manages physical CPU resource across z/OS images within an LPAR cluster based on service class goals. IRD is implemented as a collaboration between the WLM function and the PR/SM Logical Partitioning (LPAR) hypervisor:
- Logical CP Management: dynamically allocating logical processors (E.g. Vary On-Line/Off-Line)
- Relative Weight Management: dynamically redistributing CPU resource as per LPAR weights
- CHPID Management: dynamically assigning logical channel paths between eligible LPARs
IRD optimizes resource usage, enabling WLM to deliver workload goals.
The use of relative weight in association with Hard Capping and/or IRD/WLM granularity has become somewhat limited for most Mainframe installations with the advent of Sub-Capacity pricing (I.E. MLC via SCRT/R4HA). Primarily because there is no direct correlation to manage CPU resource at a meaningful level, namely the MSU (vis-à-vis CPU MIPS) metric.
Defined Capacity (DC) provides Sub-Capacity CEC pricing by allowing definition of LPAR capacity with a granularity of 1 MSU. In conjunction with the WLM function, the Defined Capacity of an LPAR dictates whether Soft Capping is invoked or not. At this juncture, we should consider how and when WLM measures CPU resource usage and if and when Soft Capping is activated and deactivated:
WLM is responsible for taking MSU utilization samples for each LPAR in 10-second intervals. Every 5 minutes, WLM documents the highest observed MSU sample value from the 10-second interval samples. This process always keeps track of the past 48 updates taken for each LPAR. When the 49th reading is taken, the 1st reading is deleted, and so on. These 48 values continually represent a total of 5 minutes * 48 readings = 240 minutes or the past 4 hours (I.E. R4HA). WLM stores the average of these 48 values in the WLM control block RCT.RCTLACS. Each time RMF (or BMC CMF equivalent) creates a Type 70 record, the SMF70LAC field represents the average of all 48 MSU values for the respective LPAR a particular Type 70 record represents. Hence, we have the “Rolling 4 Hour Average”. RMF gets the value populated in SMF70LAC from RCT.RCTLACS at the time the record is created.
SCRT also uses the Type 70 field SMF70WLA to ensure that the values recorded in SMF70LAC do not exceed the maximum available MSU capacity assigned to an LPAR. If this ever happens (due to Soft Capping or otherwise) SCRT uses the value in SMF70WLA instead of SMF70LAC. Values in SMF70WLA represent the total capacity available to the LPAR.
We should also consider the two possibilities for MLC software payment (I.E. SCRT) based upon MSU resource usage. Quite simply, the MSU value passed for SCRT invoice consideration is the R4HA or the Defined Capacity, whichever is the lowest. Put another way; if the R4HA exceeds Defined Capacity, Soft Capping applies to the LPAR.
The primary disadvantage of Soft Capping is that the Defined Capacity setting is somewhat static; it is manually defined once, maybe several times a day for workloads with distinct characteristics (E.g. On-Line, Batch, et al), but dynamic DC management based upon inter-related LPAR behaviour is at best, evolving. The primary considerations for Soft Capping are:
- An LPAR can only be managed via Soft Capping or Hard Capping; not both
- DC rules only applies to General Purpose CP’s (Hard Capping for Specialty Engines is allowed)
- An LPAR must be defined with shared CP’s (dedicated CP’s not allowed)
- All LPAR Sub-Capacity eligible products have the same MSU capacity (I.E. DC)
Soft Capping is relatively simple to implement and typically generates MLC software costs savings, with minimal impact.
Group Capacity Limit (GCL) provides an extension to the Defined Capacity (DC) Soft Capping function. GCL allows an MSU limit for total usage of all group LPARs, with a granularity of 1 MSU. The primary considerations for GCL are:
- Works with DC LPAR capacity settings
- Target share does not exceed DC
- Works with IRD
- Multiple CEC groups allowed; but an LPAR may only be defined to one group
An LPAR must be defined with shared CP’s, with WAIT COMPLETION = NO specification
It is possible to combine IRD weight management with the GCL function. Based on installation policy, IRD can modify the relative weight setting to redistribute capacity resource within an LPAR cluster.
However, IRD weight management is suspended when GCL is in effect, because LPAR resource entitlement within a capacity group can be (I.E. Pre zxC12) derived from the current weight. Hence the LPAR might get allocated an unacceptable low weight setting, generating a low GCL entitlement.
GCL also allows for MSU to be shared between LPARs in a group, where one LPAR would be a donator and another would be a receiver. Therefore the customer classifies their LPARs accordingly and when a high-priority LPAR requires additional MSU resource, it will be allocated from a lower priority LPAR, if available. This provides a modicum of flexibility, but by definition, peak workloads are not predictable and typically require a significantly higher amount of MSU for a short time period. Typically this requirement will not be satisfied with the GCL function.
Soft Capping techniques, either at the individual (DC) or group (GCL) level deliver cost saving benefit, but a fine granularity of management is required to balance cost saving versus associated performance considerations. The primary challenges associated with Soft Capping are its interactions with workload characteristics and an inability to dynamically manage MSU allocation, in-line with the R4HA. Put another way, the R4HA is derived from 48*5 Minute samples, whereas DC and GCL settings are typically defined on an infrequent (E.g. Monthly or longer) basis.
As z/OS evolves, further in-built function is available to manage MSU capacity. zSeries Capacity Provisioning Manager (CPM) is designed to simplify the management of temporary capacity, defined capacity and group capacity. The scope of z/OS Capacity Provisioning is to address capacity requirements for relatively short term workload fluctuations for which On/Off Capacity on Demand or Soft Capping changes are applicable. CPM is not a replacement for the customer derived Capacity Management process. Capacity Provisioning should not be used for providing additional capacity to systems that have Hard Capping (initial capping or absolute capping) defined.
With the introduction of z/OS 2.1, CPM functionality incorporates Soft Capping support via the DC and GCL functions. CPM functions from a set of installation defined policies and parameters, where the CPM server receives three types of input:
- Domain Configuration: defines the CPCs and z/OS systems to be managed
- Policy: contains the information as to which work is eligible, for which conditions and during which timeframes and capacity increases for constrained workloads
- Parameter: contains environment descriptors (E.g. UNIX Environment, Installation Options, et al)
From a customer viewpoint, policy definition allows them to define the provision of CPU resource:
- Date & Time: When capacity provisioning is allowed
- Workload: Which service class qualifies for provisioning?
- CPU Resource: How much additional MSU capacity can be allocated?
CPM provides more function when compared with Defined Capacity and Group Capacity Limit Soft Capping techniques. Therefore allowing for time schedules to be defined, workloads to be categorized and MSU resource to be allocated in a dynamic and granular manner.
A modicum of complexity exists when considering the arguably most important factor for CPM policy definition, namely the Performance Index (PI):
- Activation: PI of service class periods must exceed the activation threshold for a specified duration, before the work is considered as eligible.
- Deactivation: PI of service class periods must fall below the deactivation threshold for a specified duration, before the work is considered as ineligible.
- Null: If no workload condition is specified a scheduled activation/deactivation is performed; with full capacity as specified in the rule scope, unconditionally at the start and end times of the time condition.
For workload based provisioning it is a necessary condition that the current system Performance Index exceeds the specified customer policy PI metric. One must draw one’s own conclusions regarding PI criteria settings, but to date, they’re largely based on arguably complex mathematical formulae, which perhaps is not practicable, especially from a simple management viewpoint.
With the requisite hardware (I.E. zxC12+) and Operating System levels (I.E. z/OS 1.13+), CPM provides extra functionality for the customer to implement granular Soft Capping techniques to balance cost and performance. When compared with Defined Capacity and Group Capacity Limit techniques, CPM delivers increased granularity for managing capacity dynamically, based on customer derived policies, recognizing time slots, workloads and MSU resource increases accordingly.
From a big picture viewpoint, without doubt, we must recognize the fundamental role that WLM plays in Soft Capping. Quite simply, the 48*5 Minute MSU resource samples dictate whether a workload will be eligible for Soft Capping or not and from a cumulative viewpoint, these MSU samples dictate the R4HA metric. Based on this observation, efficient and functional Soft Capping must be workload based (I.E. WLM Service Class), be dynamic and operational on a 24*7 basis, because workload peaks are never predictable, while balancing MSU resource accordingly. Of course, simplicity of implementation and management, supplemented by meaningful reporting is mandatory.
Once again, observing the 48*5 Minute MSU resource samples from a R4HA viewpoint, if a workload was to increase MSU usage by an average of 50% for 1 Hour (I.E. 12 Samples), and decrease MSU usage by an average of 20% for 2.5 Hours (I.E. 30 Samples), from an average viewpoint, the R4HA has remained static. Therefore an optimum Soft Capping technique needs to recognize WLM service class requirements, reacting in a timely manner, increasing and decreasing MSU usage, to safeguard workload performance for Time Critical workloads, while optimizing SCRT MLC cost.
zDynaCap delivers automated capacity balancing within CPCs, Capacity Groups or Groups of LPARs. Central to zDynaCap are the predefined balancing policies. Within these balancing policies, users define their MSU ranges of Groups and LPARs and also the priorities of the associated LPAR Workload. zDynaCap continually monitors overall usage and compares this to the available capacity and the user defined MSU balancing policies. For example, should a high priority workload on one LPAR not get enough capacity, while a low priority workload on another within the group gets too much capacity, available MSU capacity is distributed according to customer derived balancing policies. Only if there is no leftover capacity to be rescheduled within the defined Group, and if the high or medium priority workload will be slowed down, will zDynaCap add MSU.
With zDynaCap Capacity Balancing, available MSU capacity is balanced within LPAR groups, safeguarding that during peak time the mission critical workload is processed as per business expectations (E.g. SLA/KPI) for the lowest possible MLC cost.
In conclusion, given the significance of IBM MLC software (E.g. z/OS, CICS, DB2, IMS, WebSphere MQ, et al) costs, arguably every Mainframe environment should deploy a capping technique for cost optimization. Hard Capping might work for some, but in all likelihood, Soft Capping is the primary choice for most Mainframe environments. For sure, IBM have delivered several Soft Capping techniques, with varying levels of function and granularity, namely Defined Capacity, Group Capacity Limit (GCL) and the zSeries Capacity Provisioning Manager (CPM). It was forever thus and the ISV community exists because they specialize, architect and deliver specialized solutions and zDynaCap is such a solution, recognizing the fundamental rules of IBM Mainframe Soft Capping, namely the underlying WLM and R4HA foundation.