21st Century Mainframe Capacity Planning Requirements

With nearly 5 decades of longevity the IBM Mainframe has changed beyond recognition in terms of CPU capacity and performance capability.  The Capacity Planning discipline for the IBM Mainframe server became more advanced and proactive in the early 1990’s, perhaps coinciding with the introduction of Parallel Sysplex structures associated with the MVS/ESA operating system.  Therefore the requirement to measure and model the impact of workload movement between LPAR and CPC structures became important, if not mandatory.

The fundamental building-block for Mainframe CPU usage analysis is SMF Type 7n records (I.E. RMF or CMF), where this data was typically processed by MXG, MICS and maybe CIMS (acquired by IBM), generally using SAS for reporting purposes.  Other tools, including but not limited to, BEST/1 (acquired by BMC) and PERFMAN (acquired by ASG) also offered capacity planning and performance management solutions.  Therefore, for 20+ years the fundamental Mainframe CPU usage data and associated tools have remained largely the same.  However, maybe the IBM Mainframe server has changed, both in terms of underlying CPU chip technology and customer workload deployment…

I often hear capacity planners state something along the lines of “I can report on the past with 100% accuracy, but predicting the future might prove to be a little more difficult”!  Once again, going back to the early 1990’s, the IBM Mainframe had a typical if not generic workload profile deployment, namely On-Line Transaction Processing (E.g. CICS, IMS DC) and related Database Management Subsystems (E.g. DB2, IMS DB) with Batch Processing.  This somewhat limited workload profile simplified the Capacity Planning process, applying estimates of growth based on current usage.  However, when the Mainframe became more pervasive, taking on new workloads, how was the capacity planner supposed to estimate CPU requirements for their new business application workload?

IBM introduced the Large Systems Performance Reference (LSPR) methodology, designed to provide relative processor capacity data for IBM System/370, System/390 and z/Architecture processors.  All LSPR data is based on a set of measured benchmarks and analysis, covering a variety of System Control Program (SCP) and workload environments.  LSPR data is intended to be used to estimate the capacity expectation for a production workload when considering a move to a new processor.  Although LSPR data is provided on an “as is” basis, with no warranty, it at least provides the Mainframe Capacity Planner with some insight into their CPU sizing challenge.  For many years, LSPR provided the only other data source, as well as RMF (CMF) for Mainframe CPU sizing.  However, is there a more accurate data source, perhaps based on real-life customer data?

With the introduction of the IBM System z10 server (February 2008), a new function CPU MF (CPU Measurement Facility) was incorporated.  Let’s not forget, z10 is now an n-2 technology, having been superseded by the z196/z114 and the latest zBC12/zEC12 generation of servers.  So each and every committed Mainframe customer should be positioned to benefit from the CPU MF function.

CPU MF provides optional hardware assisted collections of information about logical CPU activity executed over a specified interval in selected Logical Partitions (LPARs).  The CPU MF counters function is intended to be run on a constant basis to collect long-term performance data (I.E. SMF Record 113), in a similar manner to how you collect other performance data.  Therefore this data source can be deployed to further refine the accuracy of Mainframe CPU capacity planning projections.  Let’s not forget:

The primary on-going requirement for Mainframe Capacity Planning is to minimize any over or under capacity provision from forecast predictions, used for Mainframe server acquisition purposes”

Mainframe chip technology has also changed in complexity, especially with the latest iterations of CPU chips associated with the z10 server (E.g. POWER 6) onwards, incorporating many layers of cache memory.  Workload capacity performance will be quite sensitive to how deep into the memory hierarchy the processor must go to retrieve the workload’s instructions and data for execution.  Best performance occurs when the instructions and data are found in the cache(s) nearest the processor so that little time is spent waiting prior to execution; as instructions and data must be retrieved from farther out in the hierarchy, the processor spends more time waiting for their arrival.

As workloads are moved between processors with different memory hierarchy designs, performance will vary as the average time to retrieve instructions and data from within the memory hierarchy will vary.  Additionally, once on a processor this component will continue to vary significantly as the location of a workload’s instructions and data within the memory hierarchy is affected by many factors including; locality of reference, IO rate, competition from other resources (E.g. Applications, LPARs, et al), and so on…

The most performance sensitive area of the memory hierarchy is the activity to the memory nest, namely, the distribution of activity to the shared caches and memory.  IBM introduced new terminology, namely Relative Nest Intensity (RNI), indicating the level of activity to this part of the memory hierarchy.  Using data from CPU MF, the RNI of the workload running in an LPAR may be calculated.  The higher the RNI, the deeper into the memory hierarchy the processor must go to retrieve the instructions and data for that workload.

Therefore the Mainframe Capacity Planner does have various data sources available to forecast how an existing or new workload might perform on an upgraded processor (CPC), further refining their CPU capacity requirement forecast.  As always, the final stage in a Mainframe Capacity Planning process is to input the forecast data into the IBM Processor Capacity Reference (zPCR) tool, to determine the exact model and associated resource configuration options for their unique business workload mix.

To summarize, does your Mainframe Capacity Planning process incorporate all of these CPU sizing data sources, in an easy-to-use and cost efficient manner?

Founded by former IBM staffers and capacity planning and performance management industry veterans William Shelden, PhD, and William Hart, PerfTechPro is designed to deliver sophisticated, affordable, easy-to-use solutions for IT management professionals looking for fast, insightful help without high-cost, complex and time-consuming purchasing and licensing requirements.

PerfTechPro for z/OS is a Capacity Planning and Performance Measurement tool specifically designed for the cost conscious and savvy 21st Century data centre.  PerfTechPro for z/OS is the next evolution in Mainframe Capacity Planning tools, having been architected from ground zero using the latest techniques.  PerfTechPro for z/OS provides sophisticated capacity and performance management capabilities, affordable by any sized data centre:

  • Clean, intuitive, easy-to-use interface and graphical representations, for example:
    • Consolidated instance lists guide users to make informed selections
    • Descriptive dialog boxes detail your configuration
    • Anticipates, pre-loads data to speed retrieval, reporting and analysis
    • Automated data management
  • Forecasting and modelling
  • Non-proprietary database, enabling data use outside of PerfTechPro
  • Capable of automated collection, analysis and reporting of SMF 113 records produced by the IBM CPU Measurement Facility (CPU MF)
  • Supports measurement, management of zAAP & zIIP Specialty Engines
  • Automated analysis and management of all key capacity and performance metrics, for example:
    • GPP Utilization of All LPARs
    • MIPS Usage by CPU
    • DASD Response Times
    • Address Spaces Dispatched and Waiting 

PerfTechPro for z/OS also simplifies the data management process associated with Mainframe Capacity Planning.  Using a streamlined process on the z/OS host, PerfTechPro extracts and formats the data required from various SMF sources (E.g. SMF Type 7n, Type 113); delivering an optimized Performance Data Base (PDB) for use by the Windows based GUI.  This optimized file safeguards fast processing during the reporting and forecasting activities, while simplifying any data aggregation processes (E.g. Weekly, Monthly, et al).  Moreover, PerfTechPro allows this data to be stored in non-proprietary (E.g. Microsoft Access, SQL Server, MySQL, Oracle) and multiple database structures, as and if required.

PerfTechPro for z/OS is a simple-to-use and cost-efficient solution, allowing customers to quickly save time and money from their Capacity Planning and Performance Measurement solution.  Ultimately the bottom line objective for PerfTechPro for z/OS is to provide a best-of-breed solution for a very competitive cost. PerfTechPro for z/OS delivers business value by:

  • Ensuring enterprise zSeries Mainframe server resources are being used efficiently
  • Maximizing opportunities for cost-savings
  • Anticipating & responding to increased demand on resources
  • Reducing costs by exploiting periods of lower resource demand
  • Discerning underlying causes of performance and capacity issues
  • Eliminating time-consuming manual tracking, recording and analysis
  • Implementing disciplined management of valuable business resources

In conclusion, the Mainframe Capacity Planning process continues to evolve, forever striving to reduce any discrepancies in CPU requirements forecasting, which of course, have a high associated cost consideration.  Integrating CPU MF (SMF Type 113) must be a mandatory requirement, safeguarding that CPU Sizing, Forecasting, Modelling and Correlation Analysis activities are optimized.  Additionally, the actual process of Mainframe Capacity Planning is an activity that requires great skill and considerable associated responsibility.  A modern day solution such as PerfTechPro for z/OS is worthy of consideration, having been designed by a team with a heritage in delivering Mainframe Capacity Planning solutions, architecting function compatible with modern day functionality, while considering the latest technology zSeries CPU chip design considerations.

Application Performance Tuning – Why Bother?

With older generations of Mainframe Operating Systems, certainly MVS/XA and perhaps MVS/ESA, application performance tuning was a necessity, not an afterthought.  Quite simply, the cost of Mainframe resources, namely CPU, memory and disk, dictated that your mission critical business application might not perform to business requirements, unless you tuned your programming code.  Programmers, both of the system and application variety understood the bits and bytes of available programming languages (E.g. ASM, COBOL, PL/I) and Operating System (I.E. MVS), collaborating either via proactive process, or reactive problem solving.  With the continuing reduction of IT hardware component costs, the improvement in Operating Systems (E.g. 64-bit architecture) and newer programming languages (E.g. C, C++), it seems that application performing tuning is somewhat of an afterthought, but at what cost?

We all know that the cost of a Mainframe MIPS is significant, and although it might have reduced dramatically from a hardware viewpoint, from a software viewpoint, the cost remains largely static at ~£1,500-£3,500, per year, depending on your configuration.  So if your applications are burning several hundred if not several thousand extra MIPS unnecessarily, that’s very expensive indeed!  Additionally and just as importantly, a badly tuned system will manifest itself in slower transaction response times and longer batch jobs, if applicable, which could impact service availability.  So why is there a seeming reluctance to tune business applications, Mainframe resident or not?

If ever there was a functional IT area where the skills gap has never been wider, then application performance tuning is said skill, when comparing the salty old sea dog Mainframe dinosaur, with the newer Mainframe technician!

From an application development process viewpoint, where does the application performance tuning task live; before or after implementation?  The cynical amongst us will know; if it’s after implementation, there’s a strong likelihood said activity will never be performed!  If it’s before implementation, how many projects incorporate a meaningful stress test, or measure transaction response times versus an SLA or KPI metric?  Additionally, if the project is high-priority and/or running behind schedule, then performance testing is an activity that is easily removed…

Back in the good old days, the late 1980’s to early 1990’s, some application performance tuning tools did start to emerge, most notably Strobe.  Strobe was useful to even the most accomplished of system and application programmer personnel, and invaluable to less experienced personnel, and so arguably Strobe became the de facto software tool for tuning Mainframe applications.  However, later releases of MVS (E.g. OS/390 and z/OS), the non-event that was the Year 2000 (Y2K), seemed to remove the focus on and importance of application tuning.

Arguably most importantly of all, that software MIPS cost item, where Strobe and its competitors (E.g. ASG/BMC TriTune, CA Application Tuner, IBM APA, Macro4 ExpeTune, et al) will utilize even more CPU to capture diagnostic trace information, contributed to the demise of application performance tuning.  However, those companies that have undertaken such application tuning activities in the last decade or so are sitting pretty, having reduced the CPU (MIPS) resource consumed, lowering TCO and optimizing performance accordingly.  In the 21st Century, these software solutions are classified as Application Performance Management (APM) solutions.

Is there a better and easier way to stimulate an interest in the application performance tuning discipline?  If the desire exists to tune an application, lowering CPU MIPS usage, optimizing service performance, then the traditional tools and methods mentioned previously exist, but perhaps a new (or not so new) CPU performance data source exists…

With the introduction of the z10 server, a new function CPU MF (CPU Measurement Facility) was incorporated.  Let’s not forget, z10 is now an n-2 technology, having been superseded by the z196/z114 and the latest zBC12/zEC12 generation of servers.  So each and every committed Mainframe customer should be positioned to benefit from the CPU MF function.

CPU MF provides optional hardware assisted collections of information about logical CPU activity executed over a specified interval in selected Logical Partitions (LPARs).  The CPU MF counters function is intended to be run on a constant basis to collect long-term performance data (I.E. SMF Record 113), in a similar manner to how you collect other performance data.  I have previously briefly discussed how CPU MF SMF data can be used to increase Mainframe Server Capacity Planning efficiencies. 

The CPU MF sampling function is a short duration, precise function that identifies where CPU resources are being used, to help you improve application efficiency.  Put very simply, CPU MF sampling data has minimal CPU overhead (E.g. ~0.1-1.0%) when collecting data (I.E. z/OS Hardware Instrumentation Services – HIS), but this data can then be used to identify CPU “hot spots”, which can then be further analysed to identify the “areas of code” generating the high CPU usage.  However, it was forever thus, whether an APM tool, or CPU MF sampling data, high CPU usage can be identified, but the application programmer must undertake the task of optimizing the application code!

IBM have done a great job in providing CPU MF counters data, optimizing the Capacity Planning process with the SMF 113 record, and the realm of possibility exists with the sample data, but a software solution is required to analyse and summarize this data.

Currently there are very few if only one software solution that analyses CPU MF sample data, namely zHISR from Phoenix Software International.  zHISR interfaces directly with z/OS Hardware Instrumentation Services to collect data for hotspot analysis of customer, vendor, or operating system program execution.  zHISR features include:

  • Support for up to 128 simultaneous data collections events.  zHISR collections do not interfere with any HIS functions including sample or counter collection.
  • System console commands for many zHISR functions.
  • An Application Programming Interface to COBOL and Assembler for starting and stopping data collections. Collection lengths for API generated collections have a time range of one second or more.
  • Ability to schedule a collection with JCL so that collection starts when a given job or step begins.
  • Ability to store data collections as z/OS data sets or UNIX files.
  • Support for collections against CICS/TS transactions.
  • Analysis based on a time range within the collected data for a narrower spotlight on problem code.

An intuitive ISPF dialog allows the user to easily produce a CPU hot spots analysis, which can then be used for identifying the offending code sections.  The user can then drill down and highlight the high CPU CSECT and program offset (instruction), comparing with their Associated Data (ADATA), and thus the source programming instruction.  Therefore the skill required to perform analysis is minimal, as is the CPU overhead in collecting analysis data, and so eradicating the potential barriers when embarking on an application tuning initiative.  Furthermore, the actual cost of deploying the zHISR software is not onerous and so perhaps each and every committed Mainframe user can easily include application performance tuning into their application development lifecycle processes. 

zHISR has a UNIX file system interface that lets you navigate the system and browse or delete files.  With zHISR, users can start and stop hardware event data collections and view the status of the current or prior HIS run.  zHISR also includes a memory display/alter utility that lets you view main storage in the CPU you are logged on to.  If zIIPs are present and zHISR is defined as an authorized subsystem, nearly all of the CPU processing used by zHISR is redirected to a zIIP.

There are also instances, however few and far between, where Mainframe customers have written their own proprietary in-house OLTP (On-Line Transaction Processor) and Relational Database Management Subsystem (RDBMS), where traditional APM software tools can’t provide a solution, only interfacing with underlying subsystems (E.g. Adabas, CICS, DB2, IDMS, WebSphere, et al).  In these instances, CPU MF and zHISR offer a solution to help such customers, who probably face challenges when they upgrade their Mainframe servers, safeguarding software and application code is compatible with the new hardware, and ideally, exploits the latest functionality.

In conclusion, application performance tuning has to be a very important if not mandatory activity for the Mainframe Data Centre.  Whether via CPU MF or traditional APM software solutions, the cost reduction and performance improvement benefits of tuning should be compelling reasons to proactively engage in application tuning activities.  From a skills viewpoint, maybe the KISS (Keep It Simple Stupid) principle can apply, where CPU MF collects the data very simply and efficiently, complemented by zHISR, analysing the data in an intuitive and cost optimized manner.

So turning the subject matter on its head, Application Performance Tuning – Why Bother?  Why not!

Further information can be found from my z/OS Application Performance Tuning presentation, delivered at UK GSE in November 2012.

The Problem With Problems – Are You zAware?

Several decades ago and observing potential challenges with hardware, most of us seasoned Mainframe folk would have been familiar with the terms Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR), although repair might become resolution, replacement, and so on.  As hardware has become more reliable, with very few if any single points of failure, we don’t really use these terms for hardware, but perhaps if we don’t use them for problems associated with our business applications, we should…

Today we generally simplify this area of safeguarding business processing metrics (E.g. SLA, KPI) with the Reliability, Availability and Serviceability (RAS) terminology.  So whether hardware related by an IHV such as IBM, or software related by ISV’s such as ASG, BMC, CA, IBM, naming but a few, or application code writers, we’re all striving to improve the RAS metrics associated with our IT discipline or component.

There will always be the ubiquitous software bugs, human error when making configuration changes, and so on, but what about those scenarios we might not even consider to be a problem, yet they can have a significant impact on our business?  An end-to-end application transaction could consist of an On-Line Transaction Processor (E.g. OLTP, CICS, IMS, et al), a Relational Database Management Subsystem (E.g. RDBMS, DB2, ADABAS, IDMS, et al), a Messaging Broker (E.g. WebSphere MQ), a Networking Protocol (E.g. TCP/IP, SNA, et al), with all of the associated application infrastructure (E.g. Storage, Operating System, Server, Application Programs, Security, et al); so when we experience a “transaction failure”, which might be performance related, which component failed or caused the incident?

Systems Management disciplines dictate Mainframe Data Centres deploy a plethora of monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON, et al), but these software solutions typically generate a significant amount of data, but what we really need for successful problem solving is the right amount of meaningful information.

So ask yourself the rhetorical question.  You know it; how many application performance issues remain unsolved, because we just can’t identify which component caused the issue, or there is just too much data (E.g. System Monitor Logs) to analyse?  If you’re being honest, I guess the answer is greater than zero, perhaps significantly greater.  Further complications can occur, because of the collaboration required to resolve such issues, as each discipline, Transaction, Databases, Messaging, Networking, Security, General Systems Management, Performance Monitoring, typically reside in different teams…

IBM System z Advanced Workload Analysis Reporter (IBM zAware) is an integrated, self-learning, analytics solution for IBM z/OS that helps identify unusual system behaviour in near real time.  It is designed to help IT personnel improve problem determination so they can restore service quickly and improve overall availability.  zAware integrates with the family of IBM Mainframe System Management tools, including Runtime Diagnostics, Predictive Failure Analysis (PFA), IBM Health Checker for z/OS and z/OS Management Facility (z/OSMF).

IBM zAware runs in an LPAR on a zEC12 or later CPC.  Just like any System z LPAR, IBM zAware requires processing capacity, memory, disk storage, and connectivity.  IBM zAware is able to use either general purpose CPs or IFLs, which can be shared or dedicated.  It is generally more cost effective to deploy zAware on an IFL.

Used together with other Mainframe System Management Tools, zAware provides another view of your system(S) behaviour, helping answer questions such as:

  • Are my systems showing abnormal message activity?
  • When did this abnormal message activity start?
  • Is this abnormal message activity repetitive?
  • Are there messages appearing that have never appeared before?
  • Do the times of abnormal message activity coincide with problems in the system?
  • Is the abnormal behaviour limited to one system or are multiple systems involved?

IBM zAware creates a model of the normal operating characteristics of a z/OS system using message data captured from OPERLOG.  This message data includes any well-formed message captured by OPERLOG (I.E. A message with a tangible Message ID), whether it is from an IBM product, a non-IBM product, or one of your own application programs.  This model of past system behaviour is used as the base against which to compare message patterns that are occurring now.  The results of this comparison might help answer these questions.

IBM zAware determines, using its model of each system, what messages are new or if messages have been issued out of context based on the past normal behaviour of the system.  The model contains patterns of message ID occurrence over a previous period and does not need to know what job or started task issued the message. It also does not need to use the text of a message.

In summary, zAware is a self-learning technology, for newer zSeries Servers (I.E. zEC12 onwards), which can help reduce the time to identify the “area” of where a problem occurred, or is occurring (E.g. Near Real-Time), allowing a technician to fully identify the problem diagnosis and consider potential resolutions.  Put very simply, zAware will assist in identifying the problem, but it does not fully qualify the problem and associated resolution.  This is a good quality, as ultimately the human technician must complete this most important of activities!

So what if you’re not a zEC12 user or you’re concerned about increased costs because you don’t deploy IFL speciality engines?

ConicIT/MF is a Proactive Performance Management for First Fault Performance Problem Resolution solution.  By interfacing with standard system monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON), ConicIT/MF uses sophisticated mathematic models to perform proactive, intelligent and significant data reduction, quickly highlighting possible causes of problems, allowing for efficient problem determination.  Put another way, Systems Management Performance Monitors provide a wealth of data, but sometimes there’s too much data and not enough information.  ConicIT safeguards that the value of the data provided by Systems Management Performance Monitors is analyzed and consolidated to expedite performance problem resolution.

ConicIT runs on a distributed Linux system external to the Mainframe system being monitored.  ConicIT is a completely agentless architecture which doesn’t require installation on the Mainframe system being monitored.  It receives data from existing monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON, et al), through their standard interfaces.  3270 emulation enables ConicIT to appear as just another operator to the existing monitor and adds no more load to the monitored system than would adding an additional human operator.

Until a problem is predicted ConicIT requests basic monitor information at a very low frequency (about once per minute), but if the ConicIT analysis senses a performance problem brewing, its requests for information increase, but never so much as to effect the monitored system.  The maximum load generated by ConicIT is configurable and ConicIT supports all the major Mainframe monitors.

The monitor data stream is retrieved by parsing the data from the various (E.g. Log) data sources.  This raw data is first sent to the ConicIT data cleansing component.  Data from existing monitors is very “noisy”, since various system parameters values can fluctuate widely even when the system is running perfectly.  The job of the data cleansing algorithm is to find meaningful features from the fluctuating data.  Without an appropriate data cleansing algorithm it is very difficult or impossible for any useful analysis to take place. Such cleansing is a simple visual task for a trained operator, but is very tricky for an automated algorithm.

The relevant features found by the data cleansing algorithm are then processed to create appropriate variables.  These variables are created by a set of rules that can process the data and apply transformations to the data (E.g. combine single data points into a new synthesized variable, aggregate data points) to better describe the relevant state of the system.

These processed variables are analyzed by models that are used to discover anomalies that could be indicative of a brewing performance problem.  Each model looks for a specific type of statistical anomalies that could predict a performance problem.  No single model is appropriate for a system as complex as a large computing system, especially since the workload profile changes over time.  So rather than a single model, ConicIT generates models appropriate to the historical data from a large, predefined set of prediction algorithms.  This set of active models is used to analyze the data, detect anomalies and predict performance degradation.  The active models vote on the possibility of an upcoming problem in order to make sure that as wide a set of anomalies as possible are covered, while lowering the number of false alerts.  The set of active models change over time based on the results of an offline learning algorithm which can either generate new models based on the data, or change the weighting of existing models.  The learning algorithm is run in the background on a periodic basis.

When a possible performance problem is predicted by the active models, the ConicIT system takes two actions.  It sends an alert to the appropriate consoles and systems, and also instructs the monitor to collect information from the effected systems more frequently.  The result is that when IT personnel analyze the problem they have the information describing the state of the system and the effected system components as if they were watching the problem while it was happening.  The system also uses the information from the analysis to point out the anomalies that led the system to predict a problem, thereby aiding in root cause analysis of the problem.

So whether zAware or ConicIT, there are solutions to assist today’s busy IT technician to improve the Reliability, Availability and Serviceability (RAS) metric for their business, by implementing practicable resolutions for those problems, which previously, were just too problematic to solve.  zAware can offload its processing to an IFL, as and if available, whereas ConicIT performs its processing on a Non-Mainframe platform, and thus can support all zSeries Servers, not just the zEC12 platform.

Ultimately both the zAware and ConicIT solutions have the same objective, increasing Mean Time Between Failure (MTBF) and decreasing Mean Time To Resolution (MTTR), optimizing IT personnel time accordingly.

Flash Express – Back To The Future

It’s just not science fiction films that get rebooted or reimagined, the same thing happens in the technology world, although not always as obviously.  Over the years we have seen Star Trek, Star Wars, and numerous superhero stories be updated for current day audiences, some better than others, and similarly, flash memory or Solid State Drives (SSD) is not a new concept in the Mainframe world.  For me, I wonder whether the Back To The Future films might be rebooted, and if they do, hopefully it’s done well!  Anyway, I digress, so back to the Mainframe flash memory observations… 

Flash Express is a new feature for the zEC12 server, designed to help drive System availability and performance to even higher levels. 

Flash Express cards are delivered as a RAID 10 mirrored pair for superior resiliency and reliability.  In the unlikely event of device failure, Flash Express cards can also be concurrently replaced.  The cards are designed for superior wear levelling and have a long expected lifetime.  From a security perspective, Flash Express stored data remains protected.  Data is encrypted on the Flash Express adapter with 128-bit AES encryption.  Encryption keys are stored on smart cards that are plugged into the SE.  Removing the smart cards renders the data on the card inaccessible. 

In the first instance, IBM seems to be targeting Systems Paging (I.E. Auxiliary Storage Manager – ASM) and System Dumps (E.g. SVC, Standalone) as an introduction to this new level of storage hierarchy. 

Flash Express reduces latency for critical system paging that might otherwise impact the availability and performance.  For system paging flexibility and efficiency, Flash Express is a higher performance option when compared with traditional auxiliary storage (I.E. Disk).  z/OS uses both Flash Express and page data sets for auxiliary storage by paging data to the preferred storage medium first, based on response times, data set characteristics, and other parameters.  Wherever possible, the system will page first to Flash Express resulting in faster performance.  Especially for data intensive applications the use of Pageable Large Pages with Flash Express enables the transfer of large amounts of data at faster speeds, which can result in improved performance for DB2 analytic workloads.

NB. Because Flash Express is not persistent across IPL events, it cannot be used for Virtual I/O or PLPA data used in warm starts.  VIO and PLPA datasets must still be defined on DASD.

During diagnostic collection, as in SVC or standalone dumps, IBM states that systems can become sluggish effectively rendering key systems unavailable.  When data is transferred into main memory as part of a dump event, the fast I/O rates and low latency associated with Flash Express provide decreased first failure data capture time, and faster page-ins of the critical pages needed for dump creation.  This allows the system to return to normal workload performance faster, without incurring extra delays.

So clearly, adding a flash memory layer to the Mainframe storage hierarchy can only deliver benefit, and over time, seemingly Flash Express can be used for other I/O intensive applications, further improving response times and decreasing associated CPU cycles.

No doubt the non-Mainframe technician might say “so what, we’ve been doing flash and SSD for nearly 20 years.  Typical Mainframe, always behind the rest of the IT world”!

Hmmm, perhaps not, hence the Back To The Future reference.  Depending on your viewpoint, and perhaps urban myth, from a release date viewpoint, but in the mid-to-late 1980’s, StorageTek introduced a device called the 4080, AKA Control Data Set Manager (CDSM) for their Host Software Component (HSC) product.  You will still see this 4080 device referenced in the HSC Systems Programmer’s Guide technical manual.  The 4080 is a Solid State Device, utilizing memory storage, emulating 3380 and 3390 device types, allowing z/OS (MVS) data sets to be created and accessed, as and if required.  Arguably if not certainly, this was the first instance of logical and physical device separation for Mainframe disks, and urban myth might dictate that this StorageTek 4080 technology was used or at least borrowed for the first EMC Symmetrix disk subsystem…

StorageTek typically packaged this 4080 device within a NearLine (ATL) solution sale and customers used this SSD device for HSC Control Data Sets, but also other high I/O files, such as the IMS Write-Ahead Data Set (WADS).  Therefore, Flash Express isn’t the first SSD solution for the Mainframe, and similarly, Flash Memory/Cards, USB Sticks, et al, aren’t the first SSD type technology in the IT world.

As somebody (Machiavelli) far wiser than I once said “whoever wishes to foresee the future must consult the past”.  The longevity of the Mainframe, nearly 50 Years of technical innovation, largely dictates that technological ideas are at least borrowed from the Mainframe, whether System Paging, Fibre Channels, Storage Area Networking (SAN), Virtual Storage, Flash/SSD Memory, naming but a few.

Should today’s zEC12 Mainframe customer deploy Flash Express?  Without doubt, but to deliver maximum ROI and benefit, it should be considered as more than a faster paging and dump solution, and we look forward to how IBM will further enhance this product offering.

IBM Mainframe: Workload License Charges (WLC) Pros & Cons

It is estimated that less than half of eligible IBM Mainframe customers deploy the VWLC pricing mechanism, which in theory, is the lowest cost IBM software pricing metric.  Why?  In the first instance, let’s review the terminology…

Workload License Charges (WLC) is a monthly software license pricing metric applicable to IBM System z servers running z/OS or z/TPF in z/Architecture (64-bit) mode.  The fundamental ethos of WLC is a “pay for what you use” mechanism, allowing a lower cost of incremental growth and the potential to manage software cost by managing associated workload utilization.

WLC charges are either VWLC (Variable) or FWLC (Flat).  Not all IBM Mainframe software products are classified as VWLC eligible, but the major software is, including z/OS, CICS, DB2, IMS and WebSphere MQ, where these products are the most expensive, per MSU.  What IBM consider to be legacy products, are classified as FWLC.  More recently a modification to the VWLC mechanism was announced, namely AWLC (Advanced), strictly aligned with the latest generation of zSeries servers, namely zEC12, z196 and z114.  For the smaller user, the EWLC (Entry) mechanism applies, where AEWLC would apply for the z114 server.  There is a granular cost structure based on MSU (CPU) capacity that applies to VWLC and associated pricing mechanisms:

Band MSU Range
Base 0-3 MSU
Level 0 4-45 MSU
Level 1 46-175 MSU
Level 2 176-315 MSU
Level 3 316-575 MSU
Level 4 576-875 MSU
Level 5 876-1315 MSU
Level 6 1316-1975 MSU
Level 7 1976+ MSU

Put simply, as the MSU band increases, the related cost per MSU decreases.

IBM Mainframe users can further implement cost control by specifying how much MSU resource they use by deploying Sub-Capacity and Soft Capping techniques.  Defined Capacity (DC) allows the sizing of an LPAR in MSU, and so said LPAR will not exceed this MSU amount.  Group Capacity Limit (GCL) extends the Defined Capacity principle for a single LPAR to a group of LPARs, and so allowing MSU resource to be shared accordingly.  A potential downside of GCL is that is one LPAR of the group can consume all available MSU due to a rogue transaction (E.g. loop).

Sub-Capacity software charges are based upon LPAR hardware utilization, where the product runs, measured in hourly intervals.  To smooth out isolated usage peaks, a Rolling 4-Hour Average (R4HA) is calculated for each LPAR combination, and so software charges are based on the Monthly R4HA peak of appropriate LPAR combinations (I.E. where the software product runs) and not based on individual product measurement.

Once a Defined Capacity LPAR is deployed, this informs WLM (Workload Manager) to monitor the R4HA utilization of that LPAR.  If the LPAR R4HA utilization is less than the Defined Capacity, nothing happens.  If the LPAR R4HA utilization exceeds the Defined Capacity, then WLM signals to PR/SM and requests that Soft Capping be initiated, constraining the LPAR workload to the Defined Capacity level.

If a user chooses a Sub-Capacity WLC pricing mechanism, they will be required by IBM to submit a monthly Sub-Capacity Reporting Tool (SCRT) report.  Monthly WLC invoices are based upon hourly utilization metrics of LPAR hardware utilization, where the software product executes.  The cumulative R4HA and bottom line WLC billing metric is calculated for each product and associated LPAR group and not based on individual product measurement.

Bottom Line: From a Soft Capping viewpoint, the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So whether a customer uses Soft Capping or not, in all likelihood, there will be occasions when their workload R4HA is lower than their zSeries server MSU capacity.

So, at first glance, VWLC seems to provide a compelling pricing metric, based upon Sub-Capacity and a pay for what you use ethos, and so why wouldn’t an IBM Mainframe user deploy this pricing metric?

The IBM Planning for Sub-Capacity Pricing (SA22-7999-0n) manual states “For IBM System z10 BC and System z9 BC environments, and z890 servers, EWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 114, environments, AEWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 196, System z10 EC and System z9 EC environments, and other zSeries servers, Sub-Capacity pricing is cost-effective for many, but not all, customers.  You might even find that Sub-Capacity pricing is cost effective for some of your CPCs, but not others (although if you want pricing aggregation, you must always use the same pricing for all the CPCs in the same sysplex)”.

Conclusion: For all small Mainframe users qualifying for the EWLC (AEWLC) pricing metric, arguably this pricing mechanism is mandatory.  For the majority of larger Mainframe users, the same applies, although a granularity of adoption might be required.  IBM also have a disclaimer “Once you decide to use Sub-Capacity pricing for a specific operating system family, you cannot return to the alternative pricing methods for that operating system family on that CPC.  For example, once you select WLC you may not switch back to PSLC without prior IBM approval”.  However, the requisite contractual exit clause option does exist; the customer can switch back to the PSLC pricing metric.

Some IBM Mainframe users might object to a notion of Soft Capping, relying upon their tried and tested methodology of LPAR management via the number of CPs allocated and associated PR/SM Weight.  This is seemingly a valid notion and requirement, prioritizing performance ahead of cost optimization.

Conclusion: As previously indicated, with VWLC, SCRT invoices are generated upon a premise of the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So the VWLC pricing mechanism should deliver a granularity of cost savings, typically higher for a Soft Capping environment.

Some IBM Mainframe users might just believe that nothing can match their Parallel Sysplex Licensing Charge (PSLC) mechanism, first available in the late 1990’s, which might be attributable to other 3rd party ISV’s who cannot and will not allow for their software to be priced on a Sub-Capacity basis.  In reality, adopting the VWLC pricing mechanism delivers ~5% cost savings when compared with PSLC, as indicated by the IBM Planning for Sub-Capacity Pricing Manual (SA22-7999-0n) and related Sub-Capacity Planning Tool (SCPT).

Conclusion: Adopting Sub-Capacity based pricing metrics can only be a good thing.  If your 3rd party ISV supplier doesn’t recognise Sub-Capacity pricing, whether MIPS or MSU based, perhaps you should consider your relationship with them.  Regardless, the z10 server was the last IBM Mainframe to incorporate the “Technology Dividend” solely based on faster CPU chips.  The lower cost WLC pricing metric is now only available with the AWLC and related (E.g. AEWLC) pricing metrics, as per the z196, z114 and zEC12 servers.

Some customers might state that there is a lack of function or granularity of policy definition for IBM supplied Soft Capping (E.g. DC, GCL) or Workload Management (WLM) techniques.  To some extent this is a valid argument, but wasn’t it forever thus with IBM function?  Sub-Capacity implementation is possible via IBM, as is Workload Management (WLM), Soft Capping or not, but should the customer require extra functionality, 3rd party software solutions are available.

The zDynaCap software solution from zIT Consulting delivers a “Capacity Balancing” mechanism, integrating with R4HA and WLM methodologies, but constantly monitoring MSU usage to determine whether CPU resource can be reallocated to Mission & Time Critical workloads, based upon granular customer policies.  The only guarantee in a multiple LPAR environment, for a Mission & Time Critical LPAR to receive all available MSU resource, Soft Capping or not, is to inactivate all other LPARs!  Clearly this is not an acceptable policy for any installation, and so a best endeavours policy applies for PR/SM DC, GCL and Weight settings.

Conclusion: z/OS workloads change constantly, whether the time of day (E.g. On-Line, Batch) or period of the year (E.g. Weekly, Monthly, Quarterly, Yearly) or just by customer demand (E.g. 24 Hour Transaction Application).  Therefore a dynamic MSU management solution such as zDynaCap is arguably mandatory, implementing the optimum MSU management policy, whether for purely performance reasons, safeguarding the Mission & Time Critical workload isn’t impacted by lower priority workloads, or for cost reasons, optimizing MSU usage for the best possible monthly WLC cost.

In conclusion, not considering and arguably not implementing z/OS VWLC related pricing mechanisms is impractical, because:

  • The VWLC and AWLC related pricing metrics deliver the lowest cost per MSU for eligible z/OS software
  • When compared with PSLC, VWLC related pricing mechanisms deliver conservative ~5% cost savings
  • A pay for what you use and therefore Sub-Capacity pricing mechanism, not the installed MSU capacity
  • If extra MSU policy management granularity is required, consider 3rd party software such as zDynaCap

Software cost savings are not just for the privileged; they’re for everyone!

IBM Mainframe Capacity Planning & Software Cost Control Interaction?

The cost of IBM Mainframe software is an extensive subject matter that is multi-faceted and can generate much discussion. The importance of optimizing Mainframe software costs is without doubt, as it is the most significant Mainframe TCO component, having increased from ~25-50%+ of overall expenditure in the last decade or so. Conversely Mainframe server hardware costs have largely stabilized at ~15-25% of TCO in the same time period. However, Mainframe Capacity Planning activities have evolved over the last several decades or so, where hardware costs were the primary concern and the number of IBM Mainframe software pricing mechanisms was limited. Of course, in the last decade or so, IBM Mainframe software pricing mechanisms have evolved, with a plethora of acronyms, ESSO, ELA, IPLA, OIO, PSLC, WLC, VWLC, AWLC, IWP, naming but a few!

Can each and every IBM Mainframe user clearly articulate their Mainframe Capacity Planning and Software Cost Control policies, and which person in their organization performs these very important roles? Put another way, not forgetting Software Asset Management (SAM), should there be a Software Cost Control specialist for IBM Mainframe Data Centres…

If we consider the traditional Mainframe Capacity Planning role, put very simply, this process typically produces a 3-5 year rolling plan, based upon historical data and future capacity requirements. These requirements can then be modelled with the underlying hardware (E.g. z10, z114/z196, zEC12) server, identifying resource requirements accordingly, namely number of General Processors (GPs), Specialty Engines (E.g. zIIP, zAAP, IFL), Memory, Channels, et al. Previously, up until ~2005, customer requirements would be articulated to IBM, cross-referenced with LSPR (Large System Performance Reference) and an optimum hardware configuration derived. Since ~2005, IBM made their zPCR (Processor Capacity Reference) tool Generally Available, allowing the Mainframe customer to “more accurately” capacity plan for IBM zSeries servers.

Other enhancements to more accurately determine the ideal zSeries server include sizing based on actual customer usage data generated by the CPU MF facility introduced with the z10 server. CPU MF delivers a refinement when compared with LSPR, refining the zPCR process with real life customer usage data, compared to the standard simulated LSPR workloads.

In summary, the Mainframe Capacity Planning process has evolved to include new tools and data to refine the process, but primarily, the process remains the same, size the hardware based upon historical data and future business requirements. However, what about Mainframe software usage and therefore cost interaction?

Each and every IBM Mainframe user relies heavily on the IBM Operating System (I.E. z/OS, z/VM, z/VSE, zLinux, et al) and primary subsystems (I.E. CICS, DB2, MQ, IMS, et al). Some Mainframe users might deploy alternative database and transaction processing (TP) solutions, but a significant amount of Mainframe software cost is for IBM software products. In the late-1990’s, IBM introduced their PSLC (Parallel Sysplex License Charges), which offered lower aggregate (MSU) pricing for major IBM software products, based upon an eligible configuration (E.g. Resource Sharing). This pricing mechanism had no impact on software cost control, in fact quite the opposite; it was a significant cost benefit to implement PSLC!

In 2000 IBM announced Workload License Charges (WLC), which allowed users to pay for software based upon the workload size, as opposed to the capacity of the machine; thus the first signs of sub-capacity pricing. In 2001, the ability to deploy IBM eligible software on a “pay for what you use” basis was possible, as per the Variable Workload License Charge (VWLC) mechanism. Put very simply, a Rolling 4 Hour Average (R4HA) MSU metric applies for eligible IBM software products, where software is charged based upon the peak MSU usage during a calendar month. The Mainframe user pays for VWLC software based upon the R4HA or Defined Capacity (Sub-Capacity vis-à-vis Soft Capping), whichever is lowest.

From this point forward, and for the avoidance of doubt, for the last 10 years or so, there has been a mandatory requirement to consider the impact of IBM WLC software costs, when performing the Mainframe Capacity Planning activity. One must draw one’s own conclusions as to whether each and every Mainframe user has the skills to know the intricacies of the various software (E.g. IPLA, OIO, PSLC, WLC, et al) pricing models, when upgrading their zSeries server.

With the IBM Mainframe Charter in 2003, IBM stated that they would deliver a ~10% technology dividend benefit, loosely meaning that for each new Mainframe technology model (I.E. z9, z10), a lower MSU rating of 10% applied for the for the same system capacity level, when compared with the previous technology. Put another way, a potential ~10% software cost reduction for executing the same workload on a newer technology IBM Mainframe; so encouraging users to upgrade. However, the ~10% software cost reduction is subjective, because a higher installed MSU capacity dictates lower per MSU software costs…

With the introduction of the z196 and z114 Mainframe servers the technology dividend was delivered in the form of a new software license charge, AWLC (Advanced Workload License Charges), where lower software costs only applied if this new pricing model was deployed. A similar story for the zEC12 server, the AWLC pricing model is required to benefit from the lower software costs! If these software pricing evolutions were not enough, in 2011 IBM introduced the Integrated Workload Pricing (IWP) mechanism, offering potential for lower software pricing based upon workload type, namely a WebSphere eligible workload. Finally, and as previously alluded to, as MSU capacity increases, the related cost per MSU for software decreases, so there are many IBM software pricing mechanisms to consider when adding Mainframe CPU capacity. So once again, who is the IBM Mainframe Software Cost Control specialist in your organization?

For sure, each and every IBM Mainframe user will engage their IBM account team as and when they plan a Mainframe upgrade process, but how much “customer thinking is outsourced to IBM” during this process? Wouldn’t it be good if there was an internal “checks & balances” or due diligence activity that could verify and refine the Mainframe Capacity Plan with IBM software cost control intelligence?

Having travelled and worked in Europe for 20+ years, I know my peers, colleagues and friends that I have encountered can concur with my next observation. The English and Americans might come up with a good idea and perhaps product, the French are most likely to test that product to destruction and identify numerous new features, while the Germans will write the ultimate technical manual…

zCost Management are a French company that specializes in cost optimization services and solutions for the IBM Mainframe. From an IBM Mainframe Capacity Planning & Software Cost Control Interaction viewpoint, they have developed their CCP-Tool (Capacity and Cost Planning) software solution. This software product bridges the gap between Mainframe Capacity Planning for hardware and the impact on associated IBM software (E.g. WLC, IPLA, et al) costs.

CCP-Tool facilitates medium-term (E.g. 3-5 year) Mainframe Capacity Planning by controlling Monthly License Charges (MLC) evolution, generating cost control policies, optimizing zSeries (E.g. PR/SM) resource sharing, delivering financial management via IBM Mainframe software cost control activity. CCP-Tool integrates with existing data and activities, using SMF Type 70 & 89 records, defining events (I.E. Capacity Requirements, Workload Moves) in the plan, simulating many options, delivering your final capacity plan and periodically (I.E. Quarterly) reviewing and revising the plan. Most importantly, CCP-Tool deploys many algorithms and techniques aligned to IBM software pricing mechanisms, especially WLC and R4HA related.

Therefore CCP-Tool delivers a financial management framework via a medium-term Capacity Plan with associated software cost control and zSeries (E.g. PR/SM) resource policies. This enables a balanced viewpoint of future Data Centre cost configurations from both a hardware and related IBM Mainframe software viewpoint. Moreover, for those IBM Mainframe users that don’t necessarily have the skills to perform this level of Mainframe cost control, CCP-Tool delivers a low cost solution to empower the Mainframe customer to engage IBM on an equal footing, at least from a reporting viewpoint. Similarly, for those Mainframe users with good IBM Mainframe software cost control skills, CCP-Tool offers a “checks & balances” viewpoint, delivering that all important due diligence sanity check! Quite simply, CCP-Tool simplifies the process of reconciling the optimal configuration both from an IBM Mainframe hardware and related software viewpoint.

Without doubt, if a Mainframe user still deploys a hardware centric viewpoint of the capacity planning activity, without considering the numerous intricacies of IBM Mainframe software pricing, in most cases, this could be a significant cost oversight. Put very simply, a low-end IBM Mainframe user of ~150 MSU (1,000 MIPS) might spend ~£1,000,000 per annum, just for a minimal configuration of z/OS, CICS, COBOL and DB2 software, so one must draw one’s own conclusions regarding the potential cost savings, when deploying the optimal zSeries hardware and associated IBM software configuration. I paraphrase Oscar Wilde:

“The definition of a cynic is someone that knows the price of everything, and the value of nothing!”

So, let’s reprise. You have performed your Mainframe Capacity Planning activity, considered historical SMF data for CPU usage, maybe including the R4HA metric, factored in additional new and growth business requirements, refined the capacity plan by using the zPCR tool, perhaps with data input from CPU MF and you now have identified your optimum zSeries Mainframe server?

Maybe you should think again, because the numerous IBM MLC software pricing mechanisms could impact your tried and tested Mainframe CPU hardware planning process. Firstly, for MLC software, the unit cost per MSU reduces, as the installed MSU capacity increases. In simple terms, this encourages the use of “large container” processing entities, LPARs and CPCs. Other AWLC and IWP related considerations further encourages the use of major subsystems (E.g. CICS, DB2, WebSphere, IMS) in larger MSU capacity LPARs and CPCs to benefit from the lowest unit cost per MSU. Additionally, do you really need to run all software on all processing entities? For example, programming languages (E.g. COBOL, PL/I, HLASM, et al) are not necessarily required in all environments (E.g. Test, Development, Production, et al). It is not uncommon for compile and link-edit functions to be processed in Development environments only, while only run-time libraries are required for Production. These “what if” scenarios generated by the numerous IBM MLC software pricing mechanisms must be considered, ideally by an internal resource, with the requisite skills and experience.

Today, who is performing this Mainframe Software Cost Control in your organization? Is it an internal resource with the requisite skills, an independent 3rd party, IBM or nobody? One must draw one’s own conclusions as to whether any of these parties who could perform this vital activity has a vested interest or not, and thus a potential conflict of interests…