The IBM Mainframe: A Several Year Hardware Refresh Cycle?

Typically a new generation of IBM Mainframe server is released every three years or so, along with a number of function and performance upgrades.  In 2003, IBM released their Mainframe Charter that included a statement:

IBM lowered MSU values incorporated in the z990 microcode by approximately 10 percent, resulting in IBM software savings for IBM zSeries software products with MSU-based pricing.  These reduced MSUs do not indicate a change in machine performance. Superior performance and technology within the z990 has allowed IBM to provide improved software prices for key IBM zSeries operating system and middleware software products.

This terminology was named by some as the “Technology Dividend” where put simply, when upgrading IBM Mainframe servers, users would benefit from a ~10%+ software price versus performance benefit.  However, the z10 server model was the last IBM Mainframe series that benefitted from this hardware CPU chip related performance benefit.  Subsequent IBM Mainframe models have compensated for this slowing of hardware performance increase, by compensating with AWLC and AEWLC pricing models.  Therefore, unless your business has an absolute need for the “latest and greatest” IBM Mainframe server hardware, the realm of possibility exists that your business can extend the useful and cost efficient lifetime of your IBM Mainframe asset beyond the typical three year period…

As we all know, with every IT platform, there is a strong correlation between server hardware and associated Operating System.  Arguably the IBM Mainframe server has the best compatibility attribute, where there are many server hardware and Operating System interoperability scenarios.  A recent Statement Of Direction (SOD) for z/OS states:

Going forward, IBM intends to make new z/OS and z/OSMF releases available approximately every two years. Such a schedule would be intended to provide you with sufficient time to plan for new releases and to leverage them for the most business value. In addition, beginning with z/OS Version 2, IBM plans to provide five years of z/OS support, with three years of optional, fee-based extended service (5+3) as part of the new release cadence. Beginning with z/OSMF Version 2, IBM also plans to provide five years of z/OSMF support. However, similar to z/OSMF Version 1, optional extended service is not planned to be available for z/OSMF Version 2.

In addition, in z/OS V2.1, IBM plans to further leverage enhancements in the current IBM mainframe servers and storage control units. z/OS V2.1 is planned to IPL only on System z9 and later servers. Also, z/OS Version 2 is planned to require 3990 Model 3 (3990-3), 3990 Model 6 (3990-6), and later storage control units.

In attempt to simplify this scenario, in theory an IBM Mainframe customer could benefit from 5 years z/OS Version 2 support, with an IBM z9 or newer server.  In addition, this support could be extended for a further 3 years, for an extended service fee.  Therefore, from a software support perspective, there are no tangible cost considerations for extending the asset life of an IBM Mainframe from a 3 to 5 year cycle.

We must then consider the End of Marketing (EOM), also known as Withdrawal From Marketing (WDFM) and End Of Service (EOS) life cycles for the IBM Mainframe Server (Hardware).  Once again, when compared to other non-Mainframe platforms, the IBM Mainframe Server demonstrates an arguably unparalleled support cycle, where in the last 20 years or more, an average of 4.2 years sales and service, supplemented by an additional average of 7.1 years additional service applies.  Once again, as per z/OS Operating System support, the realm of possibility exists for extending the typical 3 year hardware refresh cycle to 5 years or longer.

When considering IBM Mainframe server hardware provision and support, there is one subtle difference that is not necessarily obvious, especially for those organizations that refresh their IBM Mainframe server every 3 years or so.  Clearly and stating the obvious, only IBM or a highly certified IBM System z Business partner can supply a latest generation IBM System z server or field upgrade option.  Conversely, there are a higher number of certified organizations that can provide IBM Mainframe hardware support services, allowing for a competitive and healthy 3rd party market for these services.  Additionally these companies also maintain inventories of equipment and have access to Microcode and Firmware upgrades that offer a possibility for performing field upgrades of EOM/WDFM servers.  One such company with a longevity and good track record of providing these value-added IBM Mainframe services from The United Kingdom is Blue Chip Customer Engineering.  As per any other competitive market place, arguably each and every IBM Mainframe user might consider obtaining a comparative hardware support services quotation for their business, whether they’re using the current latest and greatest IBM System z server model, or a slightly older (E.g. 4-8+ Years) model.

In conclusion, there are always options for the cost savvy business to reduce costs.  In the IBM Mainframe environment, soft capping via standard IBM Defined Capacity (DC) or Group Capacity Limit (GCL) function is an option, intelligent soft capping via a 3rd party product such as zDynaCap might be an option, or leveraging from the latest Absolute Capping IBM feature also applies.  Moreover, exploring the 3rd party hardware support services market might prove to be a very simple and commercial exercise that could decrease IBM Mainframe TCO, while extending asset life accordingly.

z/OS Workload Manager (WLM): Balancing Cost & Performance

A sophisticated mechanism is required to orchestrate the allocation of System z resources (E.g. CPU, Memory, I/O) to multiple z/OS workloads, requiring differing business processing priorities. Put very simply, a mechanism is required to translate business processing requirements (I.E. SLA) into an automated and equitable z/OS performance manager. Such a mechanism will safeguard the highest possible throughput, while delivering the best possible system responsiveness. Ideally, such a mechanism will assist in delivering this optimal performance, for the lowest cost; for z/OS, primarily Workload License Charges (WLC) related. Of course, the Workload Management (WLM) z/OS Operating System component delivers this functionality.

A rhetorical question for all z/OS Performance Managers and z/OS MLC Cost Managers would be “how much importance does your organization place on WLM and how proactively do you manage this seemingly pivotal z/OS component”? In essence, this seems like a ridiculous question, yet there is evidence that suggests many organizations, both customer and ISV alike, don’t necessarily consider WLM to be a fundamental or high priority performance management discipline. Let’s consider several reasons why WLM is a fundamental component in balancing cost and performance for each and every z/OS environment:

  • CPU (MSU) Resource Capping: Whatever the capping method (I.E. Absolute, Hard, Soft), WLM is a controlling mechanism, typically in conjunction with PR/SM, determining when capping is initiated, how it is managed and when it is terminated. Therefore from a dispassionate viewpoint, any 3rd party ISV product that performs MSU optimization via soft capping mechanisms should ideally consider the same CPU (E.g. SMF Type 70, 72, 99) instrumentation data as WLM. Some solutions don’t offer this granularity (E.g. AutoSoftCapping, iCap).
  • MLC R4HA Cost Management: WLM is the fundamental mechanism for controlling this #1 System z software TCO component; namely WLM collects 48 consecutive metric CPU MSU resource usage every 5 Minutes, commonly known as the Rolling 4 Hour Average (R4HA). In an ideal world, an optimally managed workload that generates a “valid monthly peak”, will fully utilize this “already paid for” available CPU MSU resource for the remainder of the MLC eligible month (I.E. Start of the 2nd day in a calendar month, to the end of the 1st day in the next calendar month). More recently, Country Multiplex Pricing (CMP) allows an organization to move workloads between System z server (I.E. CPC) structures, without cost consideration for cumulative R4HA peaks. Similarly, Mobile Workload Pricing (MWP) reporting will be simplified with WLM service definitions in z/OS 2.2. Therefore it seems prudent that real-time WLM management, both in terms of real-time reporting and pro-active decision making makes sense.
  • System z Server CPU Management: As System z server CPU chips evolve (E.g. CPU Chip Cache Hierarchy and Relative Nest Intensity), there are complementary changes to the z/OS Operating System management components. For example, HiperDispatch Mode delivers CPU resource usage benefit, considering CPU chip cache resources, intelligently allocating workload to as few logical processors as possible. It therefore follows that prioritization of workloads via WLM policy definitions becomes increasingly important. In this instance one might consider that CPU MF (SMF Type 113) and WLM Topology (SMF Type 99) are complementary reporting techniques for System z server design and management.

Since its announcement in September 1994 (I.E. MVS/ESA Version 5), WLM has evolved to become a fully-rounded and highly capable z/OS System Resources Manager (SRM), simply translating business prioritization policies into dynamic function, optimizing System z CPU, Memory and I/O resources. More recently, WLM continues to simplify the management of CPU chip cache hierarchy resources, while reporting abilities gain in strength, with topology reporting and the promise of simplified MWP reporting. Moreover, WLM resource management becomes more granular and seemingly the realm of possibility exists to “micro manage” System z performance, as and if required. Conversely, WLM provides the opportunity to simplify System z performance management, with intelligent workload differentiation (I.E. Subsystem Enclave, Batch, JES, USS, et al).

Quite simply, IBM are providing the instrumentation and tools for the 21st Century System z Performance and Software Cost Subject Matter Expert (SME) to deliver optimal performance for minimal cost. However, it is incumbent for each and every System z user to optimize software TCO, proactively implementing new processes and leveraging from System z functions accordingly.

Returning to that earlier rhetorical question about the importance of WLM; seemingly its importance is without doubt, primarily because of its instrumentation and management abilities of increasingly cache rich System z CPU chips and its fundamental role in controlling CPU MSU resource, vis-à-vis the R4HA.

Although IBM will provide the System z user with function to optimize system performance and cost, for obvious commercial reasons IBM will not reduce the base cost of System z MLC software. However, recent MLC pricing announcements, namely Country Multiplex Pricing (CMP), Mobile Workload Pricing (MWP) and Collocated Application Pricing (zCAP) provide tangible options to reduce System z MLC TCO. Therefore the System z user might need to consider how they can access real-time WLM performance metrics, intelligently combining this instrumentation data with function to intelligently optimize CPU MSU resource, managing the R4HA accordingly.

Workload X-Ray (WLXR) from zIT Consulting simplifies WLM performance reporting, enabling users to drill down into the root cause of performance variances in a very fast and easy way. WLXR assists in root cause problem determination by zooming in, starting from a high level overview, going right down to detailed Service Class performance information, such as the Performance Index (PI), showing potential bottleneck situations during peak time. Any system overhead considerations are limited, as WLXR delivers meaningful real time information on a “need to know” basis.

A fundamental design objective for WLXR is data reduction, only delivering the important information required for timely and professional workload management. Straight to the point information instead of data overload, sometimes from a plethora of data sources (E.g. SMF, System Monitors, et al). WLXR incorporates the following easy-to-use functions:

  • Simplified Data Collection & Storage: Minimal system overhead TCP/IP based agents periodically (E.g. 5, 15, 60 Minutes) collect CPU (Type 70) and WLM (Type 72) data. Performance data is stored centrally in near real-time, building a historical repository with intelligent analytics for meaningful information presentation.
  • Intelligent GUI Based Information Presentation: Meaningful decision based reports and graphs detailing CPU (E.g. MSU, R4HA, Weight) and WLM (E.g. Service Class, Performance Index, Response Time, Transaction Workflow) resource usage. A drill-down design provides a granularity of data presentation, for Management Summary to 3rd Level Technical Diagnostics use.
  • Corporate Identity Branding: A modular template design, allowing for easy corporate identity branding, with flexibility to easily add additional reports, as and if required.

Without doubt, WLM is a significant z/OS System Resources Management function, simplifying the translation of business workload requirements (I.E. Service Level Agreement) into timely and proactive allocation of major System z hardware resources (I.E. CPU, Memory, I/O). This management of System z resources has been forever thus for 20+ years, while WLM has always offered “software cost control” functionality, working with the various and evolving CPU capping techniques. What might not be so obvious, is that there is a WLM orientated price versus performance correlation, which has become more evident in the last 5 years or so. Whether Absolute Capping, HiperDispatch, Mobile Workload Pricing, Country Multiplex Pricing or evolving Soft Capping techniques, the need for System z users to integrate z/OS MLC pricing considerations alongside WLM performance based management is evident.

Historically there was not a clear and identified need for a z/OS Performance/Capacity Manager to consider MLC costs in their System z server designs. However, there is a clear and present danger that this historic modus operandi continues and there will only be one financial winner, namely IBM, with unnecessarily high MLC charges. Each and every System z user, whether large or small, can safeguard the longevity of their IBM Mainframe platform by recognizing and deploying proactive and current System z MLC cost management processes.

All too often it seems that capping can be envisaged as punitive, degrading system performance to reduce System z MLC costs. Such a notion needs to be consigned to history, with a focussed perspective on MSU optimization, where the valuable and granular MSU resource is allocated to the workload that requires such CPU resource, with near real-time performance profiling. If we perceive MSU optimization to be R4HA based and that IBM are increasing WLM function to assist this objective, CPU capping can be a benefit that does not adversely impact performance. As previously stated, once a valid R4HA peak has occurred, that high MSU watermark is available for the remainder of the MLC billing period. Similarly at a more granular level, once a workload has peaked and its MSU usage declines, the available MSU can be redirected to other workloads. With the introduction of Country Multiplex Pricing, System z users no longer need to concern themselves about creating a higher R4HA peak, when moving workloads between System z servers.

Quite simply, from the two most important perspectives, performance and cost optimization, WLM provides the majority of functionality to assist System z users get the best performance for the lowest cost. Analytics based products like Workload X-Ray (WLXR) assist this endeavour, analysing WLM data in near real-time from a performance and MLC cost perspective. It therefore follows that if this important information is also available for sophisticated MSU optimization solutions, which consider WLM performance (E.g. zDynaCap, zPrice Manager), then proactive performance and cost management follows. It’s hard to envisage how a fully-rounded MSU optimization decision can be implemented in near real-time, from an MSU optimization solution that does not consider WLM performance metrics…

z/OS Soft Capping: Balancing Cost & Performance

Historically each and every LPAR was assigned a Relative Weight value; where a more meaningful description would be the initial processing weight. This relative weight value is used to determine which LPAR gains access to resources, where multiple LPARs are competing for the same resource. Being unit-less is one minor challenge of the relative weight value, meaning that it has no explicit CPU capacity or resource value. Typically installations would use a simple multiple of ten metric, most likely 1000, and allocate weights accordingly (E.g. 600=60%, 300=30%, 10=10%, et al). Therefore during periods of resource contention, PR/SM would allocate resources to the requisite LPAR, based upon its relative weight.

Using relative weight to classify all LPARs as equal, at least from a generic class viewpoint, does have some considerations; primarily differentiating between Production and Non-Production workloads. Restricting a workload to its relative weight share of resources is known as Hard Capping. This setting is typically used to restrict Non-Production (E.g. Test) environments to their allocated resource and is also useful for cost control (E.g. Outsourcers), knowing that the LPAR will never consume more than its allocated relative weight allowance.

Hard Capping behaviour changes dependent on the use of the HiperDispatch setting. When HiperDispatch is not chosen, capping is performed at the Logical CP level, where the goal is for each logical CP to receive its relative CP share, based on the relative weight setting. When HiperDispatch is active, vertical as opposed to horizontal CPU management applies. So, a High categorization dictates capping at 100% of the logical CP, whereas a Medium or Low setting allows for resource sharing based on a relative weight per CP basis.

The Intelligent Resource Director (IRD) function provides more advanced relative weight management, automating management of CPU resources and a subset of I/O resources. Workload Manager (WLM) manages physical CPU resource across z/OS images within an LPAR cluster based on service class goals. IRD is implemented as a collaboration between the WLM function and the PR/SM Logical Partitioning (LPAR) hypervisor:

  • Logical CP Management: dynamically allocating logical processors (E.g. Vary On-Line/Off-Line)
  • Relative Weight Management: dynamically redistributing CPU resource as per LPAR weights
  • CHPID Management: dynamically assigning logical channel paths between eligible LPARs

IRD optimizes resource usage, enabling WLM to deliver workload goals.

The use of relative weight in association with Hard Capping and/or IRD/WLM granularity has become somewhat limited for most Mainframe installations with the advent of Sub-Capacity pricing (I.E. MLC via SCRT/R4HA). Primarily because there is no direct correlation to manage CPU resource at a meaningful level, namely the MSU (vis-à-vis CPU MIPS) metric.

Defined Capacity (DC) provides Sub-Capacity CEC pricing by allowing definition of LPAR capacity with a granularity of 1 MSU. In conjunction with the WLM function, the Defined Capacity of an LPAR dictates whether Soft Capping is invoked or not. At this juncture, we should consider how and when WLM measures CPU resource usage and if and when Soft Capping is activated and deactivated:

WLM is responsible for taking MSU utilization samples for each LPAR in 10-second intervals. Every 5 minutes, WLM documents the highest observed MSU sample value from the 10-second interval samples. This process always keeps track of the past 48 updates taken for each LPAR. When the 49th reading is taken, the 1st reading is deleted, and so on. These 48 values continually represent a total of 5 minutes * 48 readings = 240 minutes or the past 4 hours (I.E. R4HA). WLM stores the average of these 48 values in the WLM control block RCT.RCTLACS. Each time RMF (or BMC CMF equivalent) creates a Type 70 record, the SMF70LAC field represents the average of all 48 MSU values for the respective LPAR a particular Type 70 record represents. Hence, we have the “Rolling 4 Hour Average”. RMF gets the value populated in SMF70LAC from RCT.RCTLACS at the time the record is created.

SCRT also uses the Type 70 field SMF70WLA to ensure that the values recorded in SMF70LAC do not exceed the maximum available MSU capacity assigned to an LPAR. If this ever happens (due to Soft Capping or otherwise) SCRT uses the value in SMF70WLA instead of SMF70LAC. Values in SMF70WLA represent the total capacity available to the LPAR.

We should also consider the two possibilities for MLC software payment (I.E. SCRT) based upon MSU resource usage. Quite simply, the MSU value passed for SCRT invoice consideration is the R4HA or the Defined Capacity, whichever is the lowest. Put another way; if the R4HA exceeds Defined Capacity, Soft Capping applies to the LPAR.

The primary disadvantage of Soft Capping is that the Defined Capacity setting is somewhat static; it is manually defined once, maybe several times a day for workloads with distinct characteristics (E.g. On-Line, Batch, et al), but dynamic DC management based upon inter-related LPAR behaviour is at best, evolving. The primary considerations for Soft Capping are:

  • An LPAR can only be managed via Soft Capping or Hard Capping; not both
  • DC rules only applies to General Purpose CP’s (Hard Capping for Specialty Engines is allowed)
  • An LPAR must be defined with shared CP’s (dedicated CP’s not allowed)
  • All LPAR Sub-Capacity eligible products have the same MSU capacity (I.E. DC)

Soft Capping is relatively simple to implement and typically generates MLC software costs savings, with minimal impact.

Group Capacity Limit (GCL) provides an extension to the Defined Capacity (DC) Soft Capping function. GCL allows an MSU limit for total usage of all group LPARs, with a granularity of 1 MSU. The primary considerations for GCL are:

  • Works with DC LPAR capacity settings
  • Target share does not exceed DC
  • Works with IRD
  • Multiple CEC groups allowed; but an LPAR may only be defined to one group
    An LPAR must be defined with shared CP’s, with WAIT COMPLETION = NO specification

It is possible to combine IRD weight management with the GCL function. Based on installation policy, IRD can modify the relative weight setting to redistribute capacity resource within an LPAR cluster.

However, IRD weight management is suspended when GCL is in effect, because LPAR resource entitlement within a capacity group can be (I.E. Pre zxC12) derived from the current weight. Hence the LPAR might get allocated an unacceptable low weight setting, generating a low GCL entitlement.

GCL also allows for MSU to be shared between LPARs in a group, where one LPAR would be a donator and another would be a receiver. Therefore the customer classifies their LPARs accordingly and when a high-priority LPAR requires additional MSU resource, it will be allocated from a lower priority LPAR, if available. This provides a modicum of flexibility, but by definition, peak workloads are not predictable and typically require a significantly higher amount of MSU for a short time period. Typically this requirement will not be satisfied with the GCL function.

Soft Capping techniques, either at the individual (DC) or group (GCL) level deliver cost saving benefit, but a fine granularity of management is required to balance cost saving versus associated performance considerations. The primary challenges associated with Soft Capping are its interactions with workload characteristics and an inability to dynamically manage MSU allocation, in-line with the R4HA. Put another way, the R4HA is derived from 48*5 Minute samples, whereas DC and GCL settings are typically defined on an infrequent (E.g. Monthly or longer) basis.

As z/OS evolves, further in-built function is available to manage MSU capacity. zSeries Capacity Provisioning Manager (CPM) is designed to simplify the management of temporary capacity, defined capacity and group capacity. The scope of z/OS Capacity Provisioning is to address capacity requirements for relatively short term workload fluctuations for which On/Off Capacity on Demand or Soft Capping changes are applicable. CPM is not a replacement for the customer derived Capacity Management process. Capacity Provisioning should not be used for providing additional capacity to systems that have Hard Capping (initial capping or absolute capping) defined.

With the introduction of z/OS 2.1, CPM functionality incorporates Soft Capping support via the DC and GCL functions. CPM functions from a set of installation defined policies and parameters, where the CPM server receives three types of input:

  • Domain Configuration: defines the CPCs and z/OS systems to be managed
  • Policy: contains the information as to which work is eligible, for which conditions and during which timeframes and capacity increases for constrained workloads
  • Parameter: contains environment descriptors (E.g. UNIX Environment, Installation Options, et al)

From a customer viewpoint, policy definition allows them to define the provision of CPU resource:

  • Date & Time: When capacity provisioning is allowed
  • Workload: Which service class qualifies for provisioning?
  • CPU Resource: How much additional MSU capacity can be allocated?

CPM provides more function when compared with Defined Capacity and Group Capacity Limit Soft Capping techniques. Therefore allowing for time schedules to be defined, workloads to be categorized and MSU resource to be allocated in a dynamic and granular manner.

A modicum of complexity exists when considering the arguably most important factor for CPM policy definition, namely the Performance Index (PI):

  • Activation: PI of service class periods must exceed the activation threshold for a specified duration, before the work is considered as eligible.
  • Deactivation: PI of service class periods must fall below the deactivation threshold for a specified duration, before the work is considered as ineligible.
  • Null: If no workload condition is specified a scheduled activation/deactivation is performed; with full capacity as specified in the rule scope, unconditionally at the start and end times of the time condition.

For workload based provisioning it is a necessary condition that the current system Performance Index exceeds the specified customer policy PI metric. One must draw one’s own conclusions regarding PI criteria settings, but to date, they’re largely based on arguably complex mathematical formulae, which perhaps is not practicable, especially from a simple management viewpoint.

With the requisite hardware (I.E. zxC12+) and Operating System levels (I.E. z/OS 1.13+), CPM provides extra functionality for the customer to implement granular Soft Capping techniques to balance cost and performance. When compared with Defined Capacity and Group Capacity Limit techniques, CPM delivers increased granularity for managing capacity dynamically, based on customer derived policies, recognizing time slots, workloads and MSU resource increases accordingly.

From a big picture viewpoint, without doubt, we must recognize the fundamental role that WLM plays in Soft Capping. Quite simply, the 48*5 Minute MSU resource samples dictate whether a workload will be eligible for Soft Capping or not and from a cumulative viewpoint, these MSU samples dictate the R4HA metric. Based on this observation, efficient and functional Soft Capping must be workload based (I.E. WLM Service Class), be dynamic and operational on a 24*7 basis, because workload peaks are never predictable, while balancing MSU resource accordingly. Of course, simplicity of implementation and management, supplemented by meaningful reporting is mandatory.

Once again, observing the 48*5 Minute MSU resource samples from a R4HA viewpoint, if a workload was to increase MSU usage by an average of 50% for 1 Hour (I.E. 12 Samples), and decrease MSU usage by an average of 20% for 2.5 Hours (I.E. 30 Samples), from an average viewpoint, the R4HA has remained static. Therefore an optimum Soft Capping technique needs to recognize WLM service class requirements, reacting in a timely manner, increasing and decreasing MSU usage, to safeguard workload performance for Time Critical workloads, while optimizing SCRT MLC cost.

zDynaCap delivers automated capacity balancing within CPCs, Capacity Groups or Groups of LPARs. Central to zDynaCap are the predefined balancing policies. Within these balancing policies, users define their MSU ranges of Groups and LPARs and also the priorities of the associated LPAR Workload. zDynaCap continually monitors overall usage and compares this to the available capacity and the user defined MSU balancing policies. For example, should a high priority workload on one LPAR not get enough capacity, while a low priority workload on another within the group gets too much capacity, available MSU capacity is distributed according to customer derived balancing policies. Only if there is no leftover capacity to be rescheduled within the defined Group, and if the high or medium priority workload will be slowed down, will zDynaCap add MSU.

With zDynaCap Capacity Balancing, available MSU capacity is balanced within LPAR groups, safeguarding that during peak time the mission critical workload is processed as per business expectations (E.g. SLA/KPI) for the lowest possible MLC cost.

In conclusion, given the significance of IBM MLC software (E.g. z/OS, CICS, DB2, IMS, WebSphere MQ, et al) costs, arguably every Mainframe environment should deploy a capping technique for cost optimization. Hard Capping might work for some, but in all likelihood, Soft Capping is the primary choice for most Mainframe environments. For sure, IBM have delivered several Soft Capping techniques, with varying levels of function and granularity, namely Defined Capacity, Group Capacity Limit (GCL) and the zSeries Capacity Provisioning Manager (CPM). It was forever thus and the ISV community exists because they specialize, architect and deliver specialized solutions and zDynaCap is such a solution, recognizing the fundamental rules of IBM Mainframe Soft Capping, namely the underlying WLM and R4HA foundation.

Mainframe Server Planning: Vendor Interaction

In the last few weeks I have encountered a couple of scenarios regarding Mainframe Server upgrades that have surprised me somewhat.  The first was at the annual UK GSE conference during November 2013, where one of the largest UK Mainframe customers stated “we had problems regarding the capacity sizing of the IBM Mainframe server installed and our vendor was not very helpful in resolving this challenge with us”.  The second was a European customer with 2 aging servers deployed, z9 BC, and they had asked their IBM Mainframe server vendor to provide an upgrade quotation.  The server vendor duly replied, providing a like-for-like upgrade quotation, 2 new zBC12 servers, which at first glance seemed to be a valid configuration.

The one thing in common for these 2 vastly different Mainframe customers, the first very large, the second quite small, is that inadvertently they didn’t necessarily engage their respective vendors with the best set of questions or indeed terms of reference; while the vendors might say “ask me no questions and I’ll tell you no lies”…

For the 2nd scenario, I was asked to quickly review the configuration provided.  My first observation was to consolidate both workloads on 1 server.  The customer confirmed, there was no business reason to have 2 servers, it was historic, and there wasn’t even a SYSPLEX between the 2 z9 BC servers.  The historic reason for the 2 z9 BC servers was the number of General Purpose (GP) engines supported.  My second observation was that software licensing could be simplified and optimized with aggregated MSU and use of the AEWLC pricing model.  So within ~1 hour, the customer had a significant potential to dramatically reduce costs.

We then suggested an analysis of their configuration with 2 software products, PerfTechPro for z/OS and zDynaCap.  They already had the SMF data, so using the simulation abilities of these products, the customer quickly confirmed they could consolidate their workloads onto 1 zBC12, deploy zIIP processors to offload ~15% CPU usage from GP, and control MSU allocation with zDynaCap, saving another ~10% of CPU.  For this customer, a small investment in software products reduced their server upgrade costs by ~€400,000 in year 1, with similar software savings, each and every year forever more.  Although they didn’t have the skills in-house from a Mainframe Capacity Planning and software licensing viewpoint, this customer did eventually ask the right questions, and the rest as they say is history!

No man or indeed Mainframe customer is an island, so don’t be afraid to ask questions of your vendors or business partners!

From a cost viewpoint, both long-term (TCO) and day 1 (TCA), the requirement to deploy the optimum Mainframe server configuration from a capacity viewpoint cannot be under estimated, both in terms of hardware costs, but more importantly, associated software costs.  It therefore follows that Mainframe Capacity Planning and Mainframe Software Licensing knowledge is imperative, but I’m not so sure there are that many Mainframe customers that have clearly defined job roles for such disciplines.

To generalize, always a dangerous thing, typically the larger Mainframe customer does have skilled and seasoned personnel for the Capacity Planning discipline, while the smaller Mainframe user might rely on a generic Systems Programmer or maybe even rely on their vendor to size their Mainframe servers.  From a Mainframe software licensing viewpoint, there seems to be no general rule-of-thumb, as sometimes the smaller customer has significant knowledge and experience, whereas the larger Mainframe customer might not.  Bottom Line: If the Mainframe customer doesn’t allocate the optimum capacity and associated software licensing metrics for their installation, problems will arise, probably for several years or more!

Are there any simple solutions or processes that can assist Mainframe customers?

The first and most simple observation is to engage your vendor and safeguard that they generate the final Mainframe server configuration that is used for Purchase Order activities.  For sure, the customer will have their capacity plan and perhaps a “draft” server configuration, but even in these instances, the vendor should QA this data, refining the bill of materials (E.g. Hardware) accordingly.  Therefore an iterative process occurs between customer and vendor, but the vendor is the one that confirms the agreed configuration is fit for purpose.  In the unlikely event there are challenges in the future, the customer can work with their vendor to find a solution, as opposed to the example stated above where the vendor left their customer somewhat isolated.

The second observation is leverage from the tools and processes that are available, both generally available and internal for vendor pre sales personnel.  Seemingly everybody likes something for nothing and so the ability to deploy “free” tools will appeal to most.

For Mainframe Capacity Planning, in addition to the standard in-house processes, whether bespoke (E.g. SAS, MXG, MICS based) or a packaged product, there are other additional tools available, primarily from IBM:

zPCR (Processor Capacity Reference) is a generally available Windows PC based tool, designed to provide capacity planning insight for IBM System z processors running various z/OS, z/VM, z/VSE, Linux, zAware, and CFCC workload environments on partitioned hardware.  Capacity results are based on IBM’s most recently published LSPR data for z/OS.  Capacity is presented relative to a user-selected Reference-CPU, which may be assigned any capacity scaling-factor and metric.

zCP3000 (Performance Analysis and Capacity Planning) is an IBM internal tool, Windows PC based, designed to for performance analysis and capacity planning simulations for IBM System z processors, running various SCP and workload environments.  It can also be used to graphically analyse logically partitioned processors and DASD configurations.  Input normally comes from the customer’s system logs via a separate tool (I.E. z/OS SMF via CP2KEXTR, VM Monitor via CP3KVMXT, VSE CPUMON via VSE2EDF).

zPSG (Processor Selection Guide) is an IBM internal tool, Windows PC based, designed to provide sizing approximations for IBM System z processors intended to host a new application, implemented using popular, commercially available software products (E.g. WebSphere, DB2, ODM, Linux Apache Server).

zSoftCap (Software Migration Capacity Planning Aid) is a generally available Windows PC based tool, designed to assess the effect on IBM System z processor capacity, when planning to upgrade to a more current operating system version and/or major subsystems versions (E.g. Batch, CICS, DB2, IMS, Web and System).  zSoftCap assumes that the hardware configuration remains constant while the software version or release changes.  The capacity implication of an upgrade for the software components can be assessed independently or in any combination.

zBNA (System z Batch Network Analysis) is a generally available Windows PC based tool, designed to understand the batch window, for example:

  • Perform “what if” analysis and estimate the CPU upgrade effect on batch window
  • Identify job time sequences based on a graphical view
  • Filter jobs by attributes like CPU time / intensity, job class, service class, et al
  • Review the resource consumption of all the batch jobs
  • Drill down to the individual steps to see the resource usage
  • Identify candidate jobs for running on different processors
  • Identify jobs with speed of engine concerns (top tasks %)

BWATOOL (Batch Workload Analysis Tool) is an IBM internal tool, Windows PC based, designed to analyse SMF type 30 and 70 data, producing a report showing how long batch jobs run on the currently installed processor.  Both CPU time and elapsed time are reported. Similar results can then be projected for any IBM System z processor model. Basic questions that can be answered by BWATOOL include:

  • What jobs are good candidates for running on any given processor?
  • How much would jobs benefit from running on a faster processor?
  • For jobs within a critical path (batch window), what overall change in elapsed time might occur with a new processor?

zMCAT (Migration Capacity Analysis Tool) is an IBM internal tool, Windows PC based, designed to compare the performance of production workloads before and after migration of the system image to a new processor, even if the number of engines on the processor has changed.  Workloads for which performance is to be analysed must be carefully chosen because the power comparison may vary considerably due to differing use of system services, I/O rate, instruction mix, storage reference patterns, et al.  This is why customer experiences are unique from an internal throughput ratio (ITRR) based on LSPR benchmark data.

zTPM (Tivoli Performance Modeler) is an IBM internal tool, Windows PC based designed to let you build a model of a z/OS based IBM System z processor, and then run various “what if scenarios”.  zTPM uses simulation techniques to let you model the impact of changes on individual workload performance.  zTPM uses RMF or CMF reports as input.  Based on these reports, zTPM can create summary charts showing LPAR as well as workload utilization.  An automated Build function lets you build a model that represents the system for any reporting interval.  Once the model is built, you can make changes to see the impact on workload performance.  zTPM is also available as an IBM software product offering.

Therefore there are numerous tools available from IBM to assist their customers determine optimum Mainframe server capacity requirements.  Some of these tools are generally available without engaging the IBM account team, but others are internal to IBM, and for that reason alone, Mainframe customers must engage their IBM Mainframe account team to participate in their capacity planning activities.  Additionally, as the only supplier of Mainframe Servers, IBM have a wealth of knowledge and indeed a responsibility and generally a willingness to assist their customers deploy the right Mainframe server configuration from day 1.

As a customer, don’t be afraid to engage external 3rd parties to perform a sanity check of your thinking and activities, clearly IBM as they will be fulfilling your IBM Mainframe server order.  However, consider engaging other capacity/performance and software licensing specialists as their experience incorporates many customers, as opposed to an insular view.  Moreover, such 3rd parties probably utilize their own software tools or products to assist in this most important of disciplines.

In conclusion, as always, the worst question is the one not asked, and for this most fundamental of processes, not collaborating with your vendor and the wider community, might leave you as an individual exposed and isolated, and your company exposed to the consequences of an undersized or oversized Mainframe sever configuration…

IBM Mainframe: Workload License Charges (WLC) Pros & Cons

It is estimated that less than half of eligible IBM Mainframe customers deploy the VWLC pricing mechanism, which in theory, is the lowest cost IBM software pricing metric.  Why?  In the first instance, let’s review the terminology…

Workload License Charges (WLC) is a monthly software license pricing metric applicable to IBM System z servers running z/OS or z/TPF in z/Architecture (64-bit) mode.  The fundamental ethos of WLC is a “pay for what you use” mechanism, allowing a lower cost of incremental growth and the potential to manage software cost by managing associated workload utilization.

WLC charges are either VWLC (Variable) or FWLC (Flat).  Not all IBM Mainframe software products are classified as VWLC eligible, but the major software is, including z/OS, CICS, DB2, IMS and WebSphere MQ, where these products are the most expensive, per MSU.  What IBM consider to be legacy products, are classified as FWLC.  More recently a modification to the VWLC mechanism was announced, namely AWLC (Advanced), strictly aligned with the latest generation of zSeries servers, namely zEC12, z196 and z114.  For the smaller user, the EWLC (Entry) mechanism applies, where AEWLC would apply for the z114 server.  There is a granular cost structure based on MSU (CPU) capacity that applies to VWLC and associated pricing mechanisms:

Band MSU Range
Base 0-3 MSU
Level 0 4-45 MSU
Level 1 46-175 MSU
Level 2 176-315 MSU
Level 3 316-575 MSU
Level 4 576-875 MSU
Level 5 876-1315 MSU
Level 6 1316-1975 MSU
Level 7 1976+ MSU

Put simply, as the MSU band increases, the related cost per MSU decreases.

IBM Mainframe users can further implement cost control by specifying how much MSU resource they use by deploying Sub-Capacity and Soft Capping techniques.  Defined Capacity (DC) allows the sizing of an LPAR in MSU, and so said LPAR will not exceed this MSU amount.  Group Capacity Limit (GCL) extends the Defined Capacity principle for a single LPAR to a group of LPARs, and so allowing MSU resource to be shared accordingly.  A potential downside of GCL is that is one LPAR of the group can consume all available MSU due to a rogue transaction (E.g. loop).

Sub-Capacity software charges are based upon LPAR hardware utilization, where the product runs, measured in hourly intervals.  To smooth out isolated usage peaks, a Rolling 4-Hour Average (R4HA) is calculated for each LPAR combination, and so software charges are based on the Monthly R4HA peak of appropriate LPAR combinations (I.E. where the software product runs) and not based on individual product measurement.

Once a Defined Capacity LPAR is deployed, this informs WLM (Workload Manager) to monitor the R4HA utilization of that LPAR.  If the LPAR R4HA utilization is less than the Defined Capacity, nothing happens.  If the LPAR R4HA utilization exceeds the Defined Capacity, then WLM signals to PR/SM and requests that Soft Capping be initiated, constraining the LPAR workload to the Defined Capacity level.

If a user chooses a Sub-Capacity WLC pricing mechanism, they will be required by IBM to submit a monthly Sub-Capacity Reporting Tool (SCRT) report.  Monthly WLC invoices are based upon hourly utilization metrics of LPAR hardware utilization, where the software product executes.  The cumulative R4HA and bottom line WLC billing metric is calculated for each product and associated LPAR group and not based on individual product measurement.

Bottom Line: From a Soft Capping viewpoint, the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So whether a customer uses Soft Capping or not, in all likelihood, there will be occasions when their workload R4HA is lower than their zSeries server MSU capacity.

So, at first glance, VWLC seems to provide a compelling pricing metric, based upon Sub-Capacity and a pay for what you use ethos, and so why wouldn’t an IBM Mainframe user deploy this pricing metric?

The IBM Planning for Sub-Capacity Pricing (SA22-7999-0n) manual states “For IBM System z10 BC and System z9 BC environments, and z890 servers, EWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 114, environments, AEWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 196, System z10 EC and System z9 EC environments, and other zSeries servers, Sub-Capacity pricing is cost-effective for many, but not all, customers.  You might even find that Sub-Capacity pricing is cost effective for some of your CPCs, but not others (although if you want pricing aggregation, you must always use the same pricing for all the CPCs in the same sysplex)”.

Conclusion: For all small Mainframe users qualifying for the EWLC (AEWLC) pricing metric, arguably this pricing mechanism is mandatory.  For the majority of larger Mainframe users, the same applies, although a granularity of adoption might be required.  IBM also have a disclaimer “Once you decide to use Sub-Capacity pricing for a specific operating system family, you cannot return to the alternative pricing methods for that operating system family on that CPC.  For example, once you select WLC you may not switch back to PSLC without prior IBM approval”.  However, the requisite contractual exit clause option does exist; the customer can switch back to the PSLC pricing metric.

Some IBM Mainframe users might object to a notion of Soft Capping, relying upon their tried and tested methodology of LPAR management via the number of CPs allocated and associated PR/SM Weight.  This is seemingly a valid notion and requirement, prioritizing performance ahead of cost optimization.

Conclusion: As previously indicated, with VWLC, SCRT invoices are generated upon a premise of the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So the VWLC pricing mechanism should deliver a granularity of cost savings, typically higher for a Soft Capping environment.

Some IBM Mainframe users might just believe that nothing can match their Parallel Sysplex Licensing Charge (PSLC) mechanism, first available in the late 1990’s, which might be attributable to other 3rd party ISV’s who cannot and will not allow for their software to be priced on a Sub-Capacity basis.  In reality, adopting the VWLC pricing mechanism delivers ~5% cost savings when compared with PSLC, as indicated by the IBM Planning for Sub-Capacity Pricing Manual (SA22-7999-0n) and related Sub-Capacity Planning Tool (SCPT).

Conclusion: Adopting Sub-Capacity based pricing metrics can only be a good thing.  If your 3rd party ISV supplier doesn’t recognise Sub-Capacity pricing, whether MIPS or MSU based, perhaps you should consider your relationship with them.  Regardless, the z10 server was the last IBM Mainframe to incorporate the “Technology Dividend” solely based on faster CPU chips.  The lower cost WLC pricing metric is now only available with the AWLC and related (E.g. AEWLC) pricing metrics, as per the z196, z114 and zEC12 servers.

Some customers might state that there is a lack of function or granularity of policy definition for IBM supplied Soft Capping (E.g. DC, GCL) or Workload Management (WLM) techniques.  To some extent this is a valid argument, but wasn’t it forever thus with IBM function?  Sub-Capacity implementation is possible via IBM, as is Workload Management (WLM), Soft Capping or not, but should the customer require extra functionality, 3rd party software solutions are available.

The zDynaCap software solution from zIT Consulting delivers a “Capacity Balancing” mechanism, integrating with R4HA and WLM methodologies, but constantly monitoring MSU usage to determine whether CPU resource can be reallocated to Mission & Time Critical workloads, based upon granular customer policies.  The only guarantee in a multiple LPAR environment, for a Mission & Time Critical LPAR to receive all available MSU resource, Soft Capping or not, is to inactivate all other LPARs!  Clearly this is not an acceptable policy for any installation, and so a best endeavours policy applies for PR/SM DC, GCL and Weight settings.

Conclusion: z/OS workloads change constantly, whether the time of day (E.g. On-Line, Batch) or period of the year (E.g. Weekly, Monthly, Quarterly, Yearly) or just by customer demand (E.g. 24 Hour Transaction Application).  Therefore a dynamic MSU management solution such as zDynaCap is arguably mandatory, implementing the optimum MSU management policy, whether for purely performance reasons, safeguarding the Mission & Time Critical workload isn’t impacted by lower priority workloads, or for cost reasons, optimizing MSU usage for the best possible monthly WLC cost.

In conclusion, not considering and arguably not implementing z/OS VWLC related pricing mechanisms is impractical, because:

  • The VWLC and AWLC related pricing metrics deliver the lowest cost per MSU for eligible z/OS software
  • When compared with PSLC, VWLC related pricing mechanisms deliver conservative ~5% cost savings
  • A pay for what you use and therefore Sub-Capacity pricing mechanism, not the installed MSU capacity
  • If extra MSU policy management granularity is required, consider 3rd party software such as zDynaCap

Software cost savings are not just for the privileged; they’re for everyone!