z/OS Workload Manager (WLM): Balancing Cost & Performance

A sophisticated mechanism is required to orchestrate the allocation of System z resources (E.g. CPU, Memory, I/O) to multiple z/OS workloads, requiring differing business processing priorities. Put very simply, a mechanism is required to translate business processing requirements (I.E. SLA) into an automated and equitable z/OS performance manager. Such a mechanism will safeguard the highest possible throughput, while delivering the best possible system responsiveness. Ideally, such a mechanism will assist in delivering this optimal performance, for the lowest cost; for z/OS, primarily Workload License Charges (WLC) related. Of course, the Workload Management (WLM) z/OS Operating System component delivers this functionality.

A rhetorical question for all z/OS Performance Managers and z/OS MLC Cost Managers would be “how much importance does your organization place on WLM and how proactively do you manage this seemingly pivotal z/OS component”? In essence, this seems like a ridiculous question, yet there is evidence that suggests many organizations, both customer and ISV alike, don’t necessarily consider WLM to be a fundamental or high priority performance management discipline. Let’s consider several reasons why WLM is a fundamental component in balancing cost and performance for each and every z/OS environment:

  • CPU (MSU) Resource Capping: Whatever the capping method (I.E. Absolute, Hard, Soft), WLM is a controlling mechanism, typically in conjunction with PR/SM, determining when capping is initiated, how it is managed and when it is terminated. Therefore from a dispassionate viewpoint, any 3rd party ISV product that performs MSU optimization via soft capping mechanisms should ideally consider the same CPU (E.g. SMF Type 70, 72, 99) instrumentation data as WLM. Some solutions don’t offer this granularity (E.g. AutoSoftCapping, iCap).
  • MLC R4HA Cost Management: WLM is the fundamental mechanism for controlling this #1 System z software TCO component; namely WLM collects 48 consecutive metric CPU MSU resource usage every 5 Minutes, commonly known as the Rolling 4 Hour Average (R4HA). In an ideal world, an optimally managed workload that generates a “valid monthly peak”, will fully utilize this “already paid for” available CPU MSU resource for the remainder of the MLC eligible month (I.E. Start of the 2nd day in a calendar month, to the end of the 1st day in the next calendar month). More recently, Country Multiplex Pricing (CMP) allows an organization to move workloads between System z server (I.E. CPC) structures, without cost consideration for cumulative R4HA peaks. Similarly, Mobile Workload Pricing (MWP) reporting will be simplified with WLM service definitions in z/OS 2.2. Therefore it seems prudent that real-time WLM management, both in terms of real-time reporting and pro-active decision making makes sense.
  • System z Server CPU Management: As System z server CPU chips evolve (E.g. CPU Chip Cache Hierarchy and Relative Nest Intensity), there are complementary changes to the z/OS Operating System management components. For example, HiperDispatch Mode delivers CPU resource usage benefit, considering CPU chip cache resources, intelligently allocating workload to as few logical processors as possible. It therefore follows that prioritization of workloads via WLM policy definitions becomes increasingly important. In this instance one might consider that CPU MF (SMF Type 113) and WLM Topology (SMF Type 99) are complementary reporting techniques for System z server design and management.

Since its announcement in September 1994 (I.E. MVS/ESA Version 5), WLM has evolved to become a fully-rounded and highly capable z/OS System Resources Manager (SRM), simply translating business prioritization policies into dynamic function, optimizing System z CPU, Memory and I/O resources. More recently, WLM continues to simplify the management of CPU chip cache hierarchy resources, while reporting abilities gain in strength, with topology reporting and the promise of simplified MWP reporting. Moreover, WLM resource management becomes more granular and seemingly the realm of possibility exists to “micro manage” System z performance, as and if required. Conversely, WLM provides the opportunity to simplify System z performance management, with intelligent workload differentiation (I.E. Subsystem Enclave, Batch, JES, USS, et al).

Quite simply, IBM are providing the instrumentation and tools for the 21st Century System z Performance and Software Cost Subject Matter Expert (SME) to deliver optimal performance for minimal cost. However, it is incumbent for each and every System z user to optimize software TCO, proactively implementing new processes and leveraging from System z functions accordingly.

Returning to that earlier rhetorical question about the importance of WLM; seemingly its importance is without doubt, primarily because of its instrumentation and management abilities of increasingly cache rich System z CPU chips and its fundamental role in controlling CPU MSU resource, vis-à-vis the R4HA.

Although IBM will provide the System z user with function to optimize system performance and cost, for obvious commercial reasons IBM will not reduce the base cost of System z MLC software. However, recent MLC pricing announcements, namely Country Multiplex Pricing (CMP), Mobile Workload Pricing (MWP) and Collocated Application Pricing (zCAP) provide tangible options to reduce System z MLC TCO. Therefore the System z user might need to consider how they can access real-time WLM performance metrics, intelligently combining this instrumentation data with function to intelligently optimize CPU MSU resource, managing the R4HA accordingly.

Workload X-Ray (WLXR) from zIT Consulting simplifies WLM performance reporting, enabling users to drill down into the root cause of performance variances in a very fast and easy way. WLXR assists in root cause problem determination by zooming in, starting from a high level overview, going right down to detailed Service Class performance information, such as the Performance Index (PI), showing potential bottleneck situations during peak time. Any system overhead considerations are limited, as WLXR delivers meaningful real time information on a “need to know” basis.

A fundamental design objective for WLXR is data reduction, only delivering the important information required for timely and professional workload management. Straight to the point information instead of data overload, sometimes from a plethora of data sources (E.g. SMF, System Monitors, et al). WLXR incorporates the following easy-to-use functions:

  • Simplified Data Collection & Storage: Minimal system overhead TCP/IP based agents periodically (E.g. 5, 15, 60 Minutes) collect CPU (Type 70) and WLM (Type 72) data. Performance data is stored centrally in near real-time, building a historical repository with intelligent analytics for meaningful information presentation.
  • Intelligent GUI Based Information Presentation: Meaningful decision based reports and graphs detailing CPU (E.g. MSU, R4HA, Weight) and WLM (E.g. Service Class, Performance Index, Response Time, Transaction Workflow) resource usage. A drill-down design provides a granularity of data presentation, for Management Summary to 3rd Level Technical Diagnostics use.
  • Corporate Identity Branding: A modular template design, allowing for easy corporate identity branding, with flexibility to easily add additional reports, as and if required.

Without doubt, WLM is a significant z/OS System Resources Management function, simplifying the translation of business workload requirements (I.E. Service Level Agreement) into timely and proactive allocation of major System z hardware resources (I.E. CPU, Memory, I/O). This management of System z resources has been forever thus for 20+ years, while WLM has always offered “software cost control” functionality, working with the various and evolving CPU capping techniques. What might not be so obvious, is that there is a WLM orientated price versus performance correlation, which has become more evident in the last 5 years or so. Whether Absolute Capping, HiperDispatch, Mobile Workload Pricing, Country Multiplex Pricing or evolving Soft Capping techniques, the need for System z users to integrate z/OS MLC pricing considerations alongside WLM performance based management is evident.

Historically there was not a clear and identified need for a z/OS Performance/Capacity Manager to consider MLC costs in their System z server designs. However, there is a clear and present danger that this historic modus operandi continues and there will only be one financial winner, namely IBM, with unnecessarily high MLC charges. Each and every System z user, whether large or small, can safeguard the longevity of their IBM Mainframe platform by recognizing and deploying proactive and current System z MLC cost management processes.

All too often it seems that capping can be envisaged as punitive, degrading system performance to reduce System z MLC costs. Such a notion needs to be consigned to history, with a focussed perspective on MSU optimization, where the valuable and granular MSU resource is allocated to the workload that requires such CPU resource, with near real-time performance profiling. If we perceive MSU optimization to be R4HA based and that IBM are increasing WLM function to assist this objective, CPU capping can be a benefit that does not adversely impact performance. As previously stated, once a valid R4HA peak has occurred, that high MSU watermark is available for the remainder of the MLC billing period. Similarly at a more granular level, once a workload has peaked and its MSU usage declines, the available MSU can be redirected to other workloads. With the introduction of Country Multiplex Pricing, System z users no longer need to concern themselves about creating a higher R4HA peak, when moving workloads between System z servers.

Quite simply, from the two most important perspectives, performance and cost optimization, WLM provides the majority of functionality to assist System z users get the best performance for the lowest cost. Analytics based products like Workload X-Ray (WLXR) assist this endeavour, analysing WLM data in near real-time from a performance and MLC cost perspective. It therefore follows that if this important information is also available for sophisticated MSU optimization solutions, which consider WLM performance (E.g. zDynaCap, zPrice Manager), then proactive performance and cost management follows. It’s hard to envisage how a fully-rounded MSU optimization decision can be implemented in near real-time, from an MSU optimization solution that does not consider WLM performance metrics…

IFL – A Cost Efficient zSeries Platform?

In September 2000, IBM introduced the Integrated Facility for Linux (IFL) processor, a specialty engine for and some might say dedicated to running the Linux Operating System.  At the time of this announcement, companion software named S/390 Virtual Image Facility for Linux was introduced to assist in the rapid deployment of IFL configurations, especially for non-Mainframe personnel.  However, this product was quickly discontinued, in favour of the standard z/VM Operating System, which is not difficult to learn and can accommodate hundreds if not thousands of zLinux images.

Today, the IFL is still a processor dedicated to Linux workloads on IBM System z servers.  The IFL is supported by z/VM virtualization and the Linux operating system.  The IFL cannot run other IBM operating systems.  The competitively priced IFL processor is a CPU capacity enabler, exclusively for Linux workloads.  Linux deployment (I.E. SUSE & Red Hat) on IFL’s can reduce expenses in the areas of operational efforts, energy, floor space and especially software.

The IFL provides the following functions and benefits:

  • The IBM Enterprise Linux Server is a dedicated System z Linux server, comprised of only IFL processors
  • No additional IBM software charges for traditional (E.g. z/OS, CICS, DB2, WebSphere, et al) environment
  • Performance improvement for Linux workloads with each successive generation of IFL and System z technology
  • Linux workload on the IFL does not result in increased IBM software charges for traditional System z operating systems and middleware
  • Same functionality as a General Purpose processor on a System z server
  • HiperSockets can be used for communication between Linux images, or Linux and other operating system images on the same System z system
  • z/VM virtualization and most IBM Linux middleware products, plus most vendor software products are priced per processor (core) according to the System z IBM International Program License Agreement (IPLA).  IPLA products have a one-time-charge (OTC) and an annual (optional) maintenance charge, called Subscription & Support
  • Supported by the current z/VM virtualization and IBM Wave for z/VM software versions
  • Always a full capacity processor, independent of the capacity of the other processors in the server
  • Orderable as a System z hardware feature. The number of orderable IFL features varies by the server model and configuration
  • Designed to operate asynchronously with other General Purpose processors
  • Managed by PR/SM in logical partition with dedicated or shared processors. The implementation of an IFL requires a Logical Partition (LPAR) definition, where following normal LPAR activation procedure, LPAR defined with an IFL cannot be shared with a general purpose processor.

There will always be the debate as to which processor and associated server type (E.g. x86, POWER, SPARC) is the most cost efficient, but there is no doubt that the ability to accommodate hundreds if not thousands of zLinux instances in one zServer environmental (E.g. Power, Cooling, Floor Space, et al) friendly footprint, with software pricing per core is worthy of consideration.

Adoption for zLinux has been steady and especially in the emerging territories where it’s not unusual for zSeries deployments to be totally zLinux (I.E. IBM Enterprise Linux Server) based.  Moreover, the majority of large and traditional IBM Mainframe users (I.E. z/OS) have installed at least one IFL, if only to evaluate the z/VM and zLinux offering.  Many have deployed the IFL and associated zLinux solution for business requirements.

Therefore, if one of the major cost benefit features of IFL is optimized software costs; can the IFL processor be considered for other workloads, originating from the traditional zSeries (I.E. z/OS) environments?

Proximal Systems Corporation (PSC) is a company with a solution that transparently offloads data processing from IBM Mainframes to Distributed Systems, with an objective of reducing software cost, while maintaining or improving performance.  The company name is derived from the concept of bringing disparate computing systems into close proximity, functionally speaking, providing totally seamless and transparent interoperability.  The result is a unified computing complex within which various tasks can be easily migrated between systems to their most cost efficient operating environment, while still being able to interoperate as if they were all hosted together on the same system.

The PSC Proxy Coupling Technology allows for a CPU orientated task to be offloaded from one system to another by means of an associated proxy task, which has an identical interface as the task to be offloaded, but delegates the majority of the processing to an offloaded task on another system.  The primary objective of this function are for the cost savings and/or performance improvements that might be delivered by migrating tasks to systems that are able to execute those tasks more efficiently.

The fact that the proxy task maintains the same interface as the application being replaced is crucial; as many past Mainframe migration projects have failed due to insurmountable interoperability problems between the Mainframe and Distributed Systems servers (I.E. Windows, Linux, UNIX, et al).  Proxy Coupling Technology offers a solution to this long-standing challenge.  In theory, this allows for the transparent offload of a traditional z/OS workload (E.g. Sort) from General Purpose (GP) processors, to a less expensive (E.g. IFL) alternative…

In the first instance, the Proxy Coupling Technology offloads General Purpose CPU workload associated with the z/OS sort (I.E. CA Sort, DFSORT, Syncsort) function, to another platform (E.g. IFL).  For IFL based implementations, HyperScokets are utilized to transfer data at memory speeds from the z/OS task to zLinux on the IFL, where the sort operation completes, while the resulting z/OS task and associated data are maintained, as per normal.  From an IFL viewpoint, Ahlsort software performs the sort operation, being a sort solution that maintains compatibility with the majority of z/OS sort function (I.E. Control Card Syntax).  Therefore, this is a transparent implementation, where the only consideration is how much CPU capacity is required for the offload function (E.g. IFL, x86).  The benefits are reduced z/OS MSU usage for the sort function, which can be quite significant, as most business data (E.g. Database Offloads, Customer Orientated, et al) is sorted on a daily if not more frequent basis.

Just as IBM introduced the zAAP on zIIP capability, which allowed some customers to more easily justify a specialty engine (I.E. zIIP), combining workloads to exploit the full capability of the specialty engine; in theory the same ethos applies with the Proxy Coupling Technology.  For the avoidance of doubt, workloads that can be processed on an IFL, such as z/OS sort tasks, can assist in delivering higher Return On Investment (ROI) levels for the IFL, for example:

  • Reduced z/OS WLC MSU usage (I.E. Sort function offload) and associated software costs savings
  • IFL processors run at Full Speed and do not add to traditional workload (I.E. z/OS) software costs
  • Utilize any spare IFL CPU resource not used, releasing General Purpose CPU resource for other work

In conclusion, the Proxy Coupling Technology offers a proposition that is similar to the IBM philosophy of reducing z/OS software costs via specialty engines.  Seemingly to date, primarily only the zIIP and zAAP specialty engines were available to optimize CPU usage for z/OS workloads.  Offloading CPU cycles and thus MSU workload to IFL makes sense, utilizing a cost efficient and indeed a full power CPU engine, where for cost reasons, maybe the majority of z/OS customers don’t deploy the “highest” derivative of General Purpose CPU engine available to them.  On the face of it, the realm of possibility exists for other workloads to benefit from z/OS to IFL CPU offload, following sort, which seems to make sense as the first workload to utilize this solution.

z/OS Soft Capping: Balancing Cost & Performance

Historically each and every LPAR was assigned a Relative Weight value; where a more meaningful description would be the initial processing weight. This relative weight value is used to determine which LPAR gains access to resources, where multiple LPARs are competing for the same resource. Being unit-less is one minor challenge of the relative weight value, meaning that it has no explicit CPU capacity or resource value. Typically installations would use a simple multiple of ten metric, most likely 1000, and allocate weights accordingly (E.g. 600=60%, 300=30%, 10=10%, et al). Therefore during periods of resource contention, PR/SM would allocate resources to the requisite LPAR, based upon its relative weight.

Using relative weight to classify all LPARs as equal, at least from a generic class viewpoint, does have some considerations; primarily differentiating between Production and Non-Production workloads. Restricting a workload to its relative weight share of resources is known as Hard Capping. This setting is typically used to restrict Non-Production (E.g. Test) environments to their allocated resource and is also useful for cost control (E.g. Outsourcers), knowing that the LPAR will never consume more than its allocated relative weight allowance.

Hard Capping behaviour changes dependent on the use of the HiperDispatch setting. When HiperDispatch is not chosen, capping is performed at the Logical CP level, where the goal is for each logical CP to receive its relative CP share, based on the relative weight setting. When HiperDispatch is active, vertical as opposed to horizontal CPU management applies. So, a High categorization dictates capping at 100% of the logical CP, whereas a Medium or Low setting allows for resource sharing based on a relative weight per CP basis.

The Intelligent Resource Director (IRD) function provides more advanced relative weight management, automating management of CPU resources and a subset of I/O resources. Workload Manager (WLM) manages physical CPU resource across z/OS images within an LPAR cluster based on service class goals. IRD is implemented as a collaboration between the WLM function and the PR/SM Logical Partitioning (LPAR) hypervisor:

  • Logical CP Management: dynamically allocating logical processors (E.g. Vary On-Line/Off-Line)
  • Relative Weight Management: dynamically redistributing CPU resource as per LPAR weights
  • CHPID Management: dynamically assigning logical channel paths between eligible LPARs

IRD optimizes resource usage, enabling WLM to deliver workload goals.

The use of relative weight in association with Hard Capping and/or IRD/WLM granularity has become somewhat limited for most Mainframe installations with the advent of Sub-Capacity pricing (I.E. MLC via SCRT/R4HA). Primarily because there is no direct correlation to manage CPU resource at a meaningful level, namely the MSU (vis-à-vis CPU MIPS) metric.

Defined Capacity (DC) provides Sub-Capacity CEC pricing by allowing definition of LPAR capacity with a granularity of 1 MSU. In conjunction with the WLM function, the Defined Capacity of an LPAR dictates whether Soft Capping is invoked or not. At this juncture, we should consider how and when WLM measures CPU resource usage and if and when Soft Capping is activated and deactivated:

WLM is responsible for taking MSU utilization samples for each LPAR in 10-second intervals. Every 5 minutes, WLM documents the highest observed MSU sample value from the 10-second interval samples. This process always keeps track of the past 48 updates taken for each LPAR. When the 49th reading is taken, the 1st reading is deleted, and so on. These 48 values continually represent a total of 5 minutes * 48 readings = 240 minutes or the past 4 hours (I.E. R4HA). WLM stores the average of these 48 values in the WLM control block RCT.RCTLACS. Each time RMF (or BMC CMF equivalent) creates a Type 70 record, the SMF70LAC field represents the average of all 48 MSU values for the respective LPAR a particular Type 70 record represents. Hence, we have the “Rolling 4 Hour Average”. RMF gets the value populated in SMF70LAC from RCT.RCTLACS at the time the record is created.

SCRT also uses the Type 70 field SMF70WLA to ensure that the values recorded in SMF70LAC do not exceed the maximum available MSU capacity assigned to an LPAR. If this ever happens (due to Soft Capping or otherwise) SCRT uses the value in SMF70WLA instead of SMF70LAC. Values in SMF70WLA represent the total capacity available to the LPAR.

We should also consider the two possibilities for MLC software payment (I.E. SCRT) based upon MSU resource usage. Quite simply, the MSU value passed for SCRT invoice consideration is the R4HA or the Defined Capacity, whichever is the lowest. Put another way; if the R4HA exceeds Defined Capacity, Soft Capping applies to the LPAR.

The primary disadvantage of Soft Capping is that the Defined Capacity setting is somewhat static; it is manually defined once, maybe several times a day for workloads with distinct characteristics (E.g. On-Line, Batch, et al), but dynamic DC management based upon inter-related LPAR behaviour is at best, evolving. The primary considerations for Soft Capping are:

  • An LPAR can only be managed via Soft Capping or Hard Capping; not both
  • DC rules only applies to General Purpose CP’s (Hard Capping for Specialty Engines is allowed)
  • An LPAR must be defined with shared CP’s (dedicated CP’s not allowed)
  • All LPAR Sub-Capacity eligible products have the same MSU capacity (I.E. DC)

Soft Capping is relatively simple to implement and typically generates MLC software costs savings, with minimal impact.

Group Capacity Limit (GCL) provides an extension to the Defined Capacity (DC) Soft Capping function. GCL allows an MSU limit for total usage of all group LPARs, with a granularity of 1 MSU. The primary considerations for GCL are:

  • Works with DC LPAR capacity settings
  • Target share does not exceed DC
  • Works with IRD
  • Multiple CEC groups allowed; but an LPAR may only be defined to one group
    An LPAR must be defined with shared CP’s, with WAIT COMPLETION = NO specification

It is possible to combine IRD weight management with the GCL function. Based on installation policy, IRD can modify the relative weight setting to redistribute capacity resource within an LPAR cluster.

However, IRD weight management is suspended when GCL is in effect, because LPAR resource entitlement within a capacity group can be (I.E. Pre zxC12) derived from the current weight. Hence the LPAR might get allocated an unacceptable low weight setting, generating a low GCL entitlement.

GCL also allows for MSU to be shared between LPARs in a group, where one LPAR would be a donator and another would be a receiver. Therefore the customer classifies their LPARs accordingly and when a high-priority LPAR requires additional MSU resource, it will be allocated from a lower priority LPAR, if available. This provides a modicum of flexibility, but by definition, peak workloads are not predictable and typically require a significantly higher amount of MSU for a short time period. Typically this requirement will not be satisfied with the GCL function.

Soft Capping techniques, either at the individual (DC) or group (GCL) level deliver cost saving benefit, but a fine granularity of management is required to balance cost saving versus associated performance considerations. The primary challenges associated with Soft Capping are its interactions with workload characteristics and an inability to dynamically manage MSU allocation, in-line with the R4HA. Put another way, the R4HA is derived from 48*5 Minute samples, whereas DC and GCL settings are typically defined on an infrequent (E.g. Monthly or longer) basis.

As z/OS evolves, further in-built function is available to manage MSU capacity. zSeries Capacity Provisioning Manager (CPM) is designed to simplify the management of temporary capacity, defined capacity and group capacity. The scope of z/OS Capacity Provisioning is to address capacity requirements for relatively short term workload fluctuations for which On/Off Capacity on Demand or Soft Capping changes are applicable. CPM is not a replacement for the customer derived Capacity Management process. Capacity Provisioning should not be used for providing additional capacity to systems that have Hard Capping (initial capping or absolute capping) defined.

With the introduction of z/OS 2.1, CPM functionality incorporates Soft Capping support via the DC and GCL functions. CPM functions from a set of installation defined policies and parameters, where the CPM server receives three types of input:

  • Domain Configuration: defines the CPCs and z/OS systems to be managed
  • Policy: contains the information as to which work is eligible, for which conditions and during which timeframes and capacity increases for constrained workloads
  • Parameter: contains environment descriptors (E.g. UNIX Environment, Installation Options, et al)

From a customer viewpoint, policy definition allows them to define the provision of CPU resource:

  • Date & Time: When capacity provisioning is allowed
  • Workload: Which service class qualifies for provisioning?
  • CPU Resource: How much additional MSU capacity can be allocated?

CPM provides more function when compared with Defined Capacity and Group Capacity Limit Soft Capping techniques. Therefore allowing for time schedules to be defined, workloads to be categorized and MSU resource to be allocated in a dynamic and granular manner.

A modicum of complexity exists when considering the arguably most important factor for CPM policy definition, namely the Performance Index (PI):

  • Activation: PI of service class periods must exceed the activation threshold for a specified duration, before the work is considered as eligible.
  • Deactivation: PI of service class periods must fall below the deactivation threshold for a specified duration, before the work is considered as ineligible.
  • Null: If no workload condition is specified a scheduled activation/deactivation is performed; with full capacity as specified in the rule scope, unconditionally at the start and end times of the time condition.

For workload based provisioning it is a necessary condition that the current system Performance Index exceeds the specified customer policy PI metric. One must draw one’s own conclusions regarding PI criteria settings, but to date, they’re largely based on arguably complex mathematical formulae, which perhaps is not practicable, especially from a simple management viewpoint.

With the requisite hardware (I.E. zxC12+) and Operating System levels (I.E. z/OS 1.13+), CPM provides extra functionality for the customer to implement granular Soft Capping techniques to balance cost and performance. When compared with Defined Capacity and Group Capacity Limit techniques, CPM delivers increased granularity for managing capacity dynamically, based on customer derived policies, recognizing time slots, workloads and MSU resource increases accordingly.

From a big picture viewpoint, without doubt, we must recognize the fundamental role that WLM plays in Soft Capping. Quite simply, the 48*5 Minute MSU resource samples dictate whether a workload will be eligible for Soft Capping or not and from a cumulative viewpoint, these MSU samples dictate the R4HA metric. Based on this observation, efficient and functional Soft Capping must be workload based (I.E. WLM Service Class), be dynamic and operational on a 24*7 basis, because workload peaks are never predictable, while balancing MSU resource accordingly. Of course, simplicity of implementation and management, supplemented by meaningful reporting is mandatory.

Once again, observing the 48*5 Minute MSU resource samples from a R4HA viewpoint, if a workload was to increase MSU usage by an average of 50% for 1 Hour (I.E. 12 Samples), and decrease MSU usage by an average of 20% for 2.5 Hours (I.E. 30 Samples), from an average viewpoint, the R4HA has remained static. Therefore an optimum Soft Capping technique needs to recognize WLM service class requirements, reacting in a timely manner, increasing and decreasing MSU usage, to safeguard workload performance for Time Critical workloads, while optimizing SCRT MLC cost.

zDynaCap delivers automated capacity balancing within CPCs, Capacity Groups or Groups of LPARs. Central to zDynaCap are the predefined balancing policies. Within these balancing policies, users define their MSU ranges of Groups and LPARs and also the priorities of the associated LPAR Workload. zDynaCap continually monitors overall usage and compares this to the available capacity and the user defined MSU balancing policies. For example, should a high priority workload on one LPAR not get enough capacity, while a low priority workload on another within the group gets too much capacity, available MSU capacity is distributed according to customer derived balancing policies. Only if there is no leftover capacity to be rescheduled within the defined Group, and if the high or medium priority workload will be slowed down, will zDynaCap add MSU.

With zDynaCap Capacity Balancing, available MSU capacity is balanced within LPAR groups, safeguarding that during peak time the mission critical workload is processed as per business expectations (E.g. SLA/KPI) for the lowest possible MLC cost.

In conclusion, given the significance of IBM MLC software (E.g. z/OS, CICS, DB2, IMS, WebSphere MQ, et al) costs, arguably every Mainframe environment should deploy a capping technique for cost optimization. Hard Capping might work for some, but in all likelihood, Soft Capping is the primary choice for most Mainframe environments. For sure, IBM have delivered several Soft Capping techniques, with varying levels of function and granularity, namely Defined Capacity, Group Capacity Limit (GCL) and the zSeries Capacity Provisioning Manager (CPM). It was forever thus and the ISV community exists because they specialize, architect and deliver specialized solutions and zDynaCap is such a solution, recognizing the fundamental rules of IBM Mainframe Soft Capping, namely the underlying WLM and R4HA foundation.

IBM Mainframe: Workload License Charges (WLC) Pros & Cons

It is estimated that less than half of eligible IBM Mainframe customers deploy the VWLC pricing mechanism, which in theory, is the lowest cost IBM software pricing metric.  Why?  In the first instance, let’s review the terminology…

Workload License Charges (WLC) is a monthly software license pricing metric applicable to IBM System z servers running z/OS or z/TPF in z/Architecture (64-bit) mode.  The fundamental ethos of WLC is a “pay for what you use” mechanism, allowing a lower cost of incremental growth and the potential to manage software cost by managing associated workload utilization.

WLC charges are either VWLC (Variable) or FWLC (Flat).  Not all IBM Mainframe software products are classified as VWLC eligible, but the major software is, including z/OS, CICS, DB2, IMS and WebSphere MQ, where these products are the most expensive, per MSU.  What IBM consider to be legacy products, are classified as FWLC.  More recently a modification to the VWLC mechanism was announced, namely AWLC (Advanced), strictly aligned with the latest generation of zSeries servers, namely zEC12, z196 and z114.  For the smaller user, the EWLC (Entry) mechanism applies, where AEWLC would apply for the z114 server.  There is a granular cost structure based on MSU (CPU) capacity that applies to VWLC and associated pricing mechanisms:

Band MSU Range
Base 0-3 MSU
Level 0 4-45 MSU
Level 1 46-175 MSU
Level 2 176-315 MSU
Level 3 316-575 MSU
Level 4 576-875 MSU
Level 5 876-1315 MSU
Level 6 1316-1975 MSU
Level 7 1976+ MSU

Put simply, as the MSU band increases, the related cost per MSU decreases.

IBM Mainframe users can further implement cost control by specifying how much MSU resource they use by deploying Sub-Capacity and Soft Capping techniques.  Defined Capacity (DC) allows the sizing of an LPAR in MSU, and so said LPAR will not exceed this MSU amount.  Group Capacity Limit (GCL) extends the Defined Capacity principle for a single LPAR to a group of LPARs, and so allowing MSU resource to be shared accordingly.  A potential downside of GCL is that is one LPAR of the group can consume all available MSU due to a rogue transaction (E.g. loop).

Sub-Capacity software charges are based upon LPAR hardware utilization, where the product runs, measured in hourly intervals.  To smooth out isolated usage peaks, a Rolling 4-Hour Average (R4HA) is calculated for each LPAR combination, and so software charges are based on the Monthly R4HA peak of appropriate LPAR combinations (I.E. where the software product runs) and not based on individual product measurement.

Once a Defined Capacity LPAR is deployed, this informs WLM (Workload Manager) to monitor the R4HA utilization of that LPAR.  If the LPAR R4HA utilization is less than the Defined Capacity, nothing happens.  If the LPAR R4HA utilization exceeds the Defined Capacity, then WLM signals to PR/SM and requests that Soft Capping be initiated, constraining the LPAR workload to the Defined Capacity level.

If a user chooses a Sub-Capacity WLC pricing mechanism, they will be required by IBM to submit a monthly Sub-Capacity Reporting Tool (SCRT) report.  Monthly WLC invoices are based upon hourly utilization metrics of LPAR hardware utilization, where the software product executes.  The cumulative R4HA and bottom line WLC billing metric is calculated for each product and associated LPAR group and not based on individual product measurement.

Bottom Line: From a Soft Capping viewpoint, the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So whether a customer uses Soft Capping or not, in all likelihood, there will be occasions when their workload R4HA is lower than their zSeries server MSU capacity.

So, at first glance, VWLC seems to provide a compelling pricing metric, based upon Sub-Capacity and a pay for what you use ethos, and so why wouldn’t an IBM Mainframe user deploy this pricing metric?

The IBM Planning for Sub-Capacity Pricing (SA22-7999-0n) manual states “For IBM System z10 BC and System z9 BC environments, and z890 servers, EWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 114, environments, AEWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 196, System z10 EC and System z9 EC environments, and other zSeries servers, Sub-Capacity pricing is cost-effective for many, but not all, customers.  You might even find that Sub-Capacity pricing is cost effective for some of your CPCs, but not others (although if you want pricing aggregation, you must always use the same pricing for all the CPCs in the same sysplex)”.

Conclusion: For all small Mainframe users qualifying for the EWLC (AEWLC) pricing metric, arguably this pricing mechanism is mandatory.  For the majority of larger Mainframe users, the same applies, although a granularity of adoption might be required.  IBM also have a disclaimer “Once you decide to use Sub-Capacity pricing for a specific operating system family, you cannot return to the alternative pricing methods for that operating system family on that CPC.  For example, once you select WLC you may not switch back to PSLC without prior IBM approval”.  However, the requisite contractual exit clause option does exist; the customer can switch back to the PSLC pricing metric.

Some IBM Mainframe users might object to a notion of Soft Capping, relying upon their tried and tested methodology of LPAR management via the number of CPs allocated and associated PR/SM Weight.  This is seemingly a valid notion and requirement, prioritizing performance ahead of cost optimization.

Conclusion: As previously indicated, with VWLC, SCRT invoices are generated upon a premise of the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So the VWLC pricing mechanism should deliver a granularity of cost savings, typically higher for a Soft Capping environment.

Some IBM Mainframe users might just believe that nothing can match their Parallel Sysplex Licensing Charge (PSLC) mechanism, first available in the late 1990’s, which might be attributable to other 3rd party ISV’s who cannot and will not allow for their software to be priced on a Sub-Capacity basis.  In reality, adopting the VWLC pricing mechanism delivers ~5% cost savings when compared with PSLC, as indicated by the IBM Planning for Sub-Capacity Pricing Manual (SA22-7999-0n) and related Sub-Capacity Planning Tool (SCPT).

Conclusion: Adopting Sub-Capacity based pricing metrics can only be a good thing.  If your 3rd party ISV supplier doesn’t recognise Sub-Capacity pricing, whether MIPS or MSU based, perhaps you should consider your relationship with them.  Regardless, the z10 server was the last IBM Mainframe to incorporate the “Technology Dividend” solely based on faster CPU chips.  The lower cost WLC pricing metric is now only available with the AWLC and related (E.g. AEWLC) pricing metrics, as per the z196, z114 and zEC12 servers.

Some customers might state that there is a lack of function or granularity of policy definition for IBM supplied Soft Capping (E.g. DC, GCL) or Workload Management (WLM) techniques.  To some extent this is a valid argument, but wasn’t it forever thus with IBM function?  Sub-Capacity implementation is possible via IBM, as is Workload Management (WLM), Soft Capping or not, but should the customer require extra functionality, 3rd party software solutions are available.

The zDynaCap software solution from zIT Consulting delivers a “Capacity Balancing” mechanism, integrating with R4HA and WLM methodologies, but constantly monitoring MSU usage to determine whether CPU resource can be reallocated to Mission & Time Critical workloads, based upon granular customer policies.  The only guarantee in a multiple LPAR environment, for a Mission & Time Critical LPAR to receive all available MSU resource, Soft Capping or not, is to inactivate all other LPARs!  Clearly this is not an acceptable policy for any installation, and so a best endeavours policy applies for PR/SM DC, GCL and Weight settings.

Conclusion: z/OS workloads change constantly, whether the time of day (E.g. On-Line, Batch) or period of the year (E.g. Weekly, Monthly, Quarterly, Yearly) or just by customer demand (E.g. 24 Hour Transaction Application).  Therefore a dynamic MSU management solution such as zDynaCap is arguably mandatory, implementing the optimum MSU management policy, whether for purely performance reasons, safeguarding the Mission & Time Critical workload isn’t impacted by lower priority workloads, or for cost reasons, optimizing MSU usage for the best possible monthly WLC cost.

In conclusion, not considering and arguably not implementing z/OS VWLC related pricing mechanisms is impractical, because:

  • The VWLC and AWLC related pricing metrics deliver the lowest cost per MSU for eligible z/OS software
  • When compared with PSLC, VWLC related pricing mechanisms deliver conservative ~5% cost savings
  • A pay for what you use and therefore Sub-Capacity pricing mechanism, not the installed MSU capacity
  • If extra MSU policy management granularity is required, consider 3rd party software such as zDynaCap

Software cost savings are not just for the privileged; they’re for everyone!