Mainframe ISV Software: Is Continuous Product Improvement Always Evident?

Social Media Sharing

Ken Venturi once said “I don’t believe you have to be better than everybody else.  I believe you have to be better than you ever thought you could be”.

Wouldn’t it be great if every CTO and/or Product manager had this same philosophy for their Mainframe software solution?  One such example I have experienced over the years is (E)JES from Phoenix Software International (PSI).  Of course it’s really important to have Day 1 support for the latest release of Operating System, z/OS 2.1 being the latest example, but what about actually exploiting the latest functionality available with the latest zSeries Mainframe Enterprise Servers and z/OS Operating Systems?

To drive maximum bang from you’re your buck, optimal performance and robust cost optimization can only be possible by recognizing and exploiting the latest Mainframe function ASAP, as and when appropriate.  Furthermore, listening to your customers, analysing their feedback, actively participating in User Organizations such as SHARE, and so on, will all help in continuous product development and innovation.

Here are some of the reasons why (E)JES has succeeded over a 30+ year period, recognizing and exploiting new z/OS function, as and when the updated z/OS is released for General Availability (GA).  Even today, with Version 5.3 supporting z/OS 2.1 as of day 1, (E)JES continues to offer value-added function for the seasoned, inexperienced and in fact, all IBM Mainframe technicians:

  • 64-bit performance optimizations (I.E. MEMLIMIT: above-the-bar) for both (E)JES client and server components, safeguarding minimal z/OS resource usage.
  • Nearly all (E)JES JES subsystem processing routines are eligible for zIIP redirection, delivering software cost savings for all (E)JES users.  Sub-Capacity System z processor users experience improved (E)JES performance because zIIP engines always run at full speed.  This behaviour differs from that of General Purpose CPs, “throttled” with Sub-Capacity deployments.
  • (E)JES code executes faster via its inbuilt High Performance Routine (HPR) facility, specifically developed to make (E)JES code execute faster while accessing data in JES control blocks.  HPRs have a shorter instruction path length than previous coding techniques, avoiding delays in modern z Series CPU instruction pipelines.
  • If High Performance FICON (zHPF) is available, (E)JES uses Transport Mode channel programs for JES Spool I/O.  When zHPF is not available, or when a CAS server performs I/O against the global data set, (E)JES uses the highest-performing Command Mode channel programs currently available.  These channel programs perform I/O significantly faster than “ordinary” channel programs.
  • The use of 24-bit (captured) UCBs puts a strain on the 24-bit virtual storage resource.  The use of ordinary (non-extended) TIOT entries puts a limit on the total number of allocations that can exist simultaneously in an address space.  (E)JES supports and uses 31-bit (uncaptured) UCBs and the extended TIOT (XTIOT) function (I.E. NON_VSAM_XTIOT=YES in DEVSUPxx PARMLIB)
  • (E)JES supports placement of JES spool data sets in the cylinder-managed area of an Extended Address Volume (EAV).  Of course, as of z/OS 1.12, EAV increases 3390 DASD capacity to ~1 TB.
  • (E)JES Pattern Utility Matching uses the SRST hardware instruction.  Empirical measurements show this technique is far faster on modern System z processors than alternatives such as the TRT instruction or “brute force” matching techniques using CLI/CLC.

One of the primary benefits of upgrading IBM z/OS software is the overall system performance benefit and associated cost reduction, but of course, IBM can only deliver the function and ability, while it’s incumbent upon the ISV community to upgrade their software products accordingly.  A key goal for any good ISV software product is to try to provide a value-add in the area of performance.  This has been one of the primary areas of focus for (E)JES since its introduction in 1978. 

Most spool display and management products tend to rely on the most resource-intensive interface available, namely the JES subsystem provided SSI 80.  (E)JES benchmarking tests against the most readily-available JES SSI 80 exploiters demonstrates significant CPU savings when deploying (E)JES.

Software products also need to deliver continuous improvements with regard to usability, presentation and in-built function, increasing user and system administrator productivity.  Without doubt, optimization encompasses not just hardware, but software, services, systems management disciplines and “best practices” that tie it all together.  Here are some of the usability enhancements that (E)JES has incorporated:

  • ISPF users running a 3270 emulator on a programmable workstation can now search IBM Eclipse-based InfoCenters via (E)JES.  Although (E)JES fully supports BookManager format documentation, BookManager READ/MVS is now obsolete, beginning with z/OS 2.1, BookManager softcopy books are no longer delivered by IBM.  IBM has stated that InfoCenters, and eventually KnowledgeCenters, are their strategic direction for online documentation.
  • (E)JES Web is a new, browser-based interface to (E)JES.  The associated RESTful API delivering this web enabled technology provides a framework for the creation of Eclipse plug-ins, mobile applications, and other web services clients.  This facility will provide a “rapid learning” type facility for users (E)JES users, both new and old that might be uncomfortable navigating traditional 3270 interfaces.
  • (E)JES provides a Java Application Programming Interface (API), complementing other in-built APIs for REXX and procedural languages.  By using an (E)JES API, a user can harness the versatility of their preferred programming language to interface and interact with (E)JES.  This support provides an interface to deliver nearly all of the capabilities available to an interactive (E)JES user.
  • (E)JES incorporates context sensitive help function, with point-and-shoot/pop-up dialogs, helping educate users on (E)JES, JES and z/OS while they work.  Users can get pop-up explanations of columns, input choices for unprotected fields, and a list of line commands.  Smart pop-ups explain the contents of certain columns, such as system abend codes.

The latest (E)JES Release Information Manual eloquently details the product enhancements over the last 5 releases or so, providing a good Product Roadmap reference point.

So, whether the ISV software product you deploy has been available for several years or several decades, do you safeguard maximum business benefit for optimal cost by considering:

  • Does the ISV deploy the latest zSeries server (I.E. zBC12, zEC12) for software interoperability and full hardware function exploitation; or an emulation (I.E. zPDT) technique?
  • Does the ISV deliver value-added z/OS related function on Day 1 or even within a year of the latest z/OS release?
  • Does the ISV deliver meaningful function to assist your users deploy said function, while simplifying environment management for system administrators?
  • Does your ISV product optimize cost, with Sub-Capacity pricing in MSU increments, aggregated MSU costs for your entire zSeries Mainframe environment, as opposed to specific workloads (E.g. CPC’s, LPAR’s, et al)?
  • Does your ISV product optimize cost by offloading the majority of its CPU function to zIIP specialty engines, which run at maximum speed, and where software “runs for free”?

Of course, only you can ask and potentially answer these questions during your day-to-day activities of maintaining currency and optimal performance for your Mainframe software portfolio.

Sometimes the hardest questions anybody can ask are the questions they ask themselves, which are never rhetorical questions!  Extracted verbatim from the latest (E)JES Release Information Manual:

Team (E)JES took advantage of the Phoenix Software International zHISR performance analysis product to discover performance “hot spots” in  the (E)JES product.  Sometimes the simplest, least conspicuous piece of code turns out to be a major CPU contributor.  See below for some of the most embarrassing “surprise” hot spots we discovered using zHISR in a z/OS 2.1 LPAR:

  • Over 30% of the CPU used during a Spool Data Browse FIND operation, against a multi-million-line SYSOUT in JES2, turned out to be code that was clearing a record buffer to blanks using MVCL.  This clearing code was eliminated and some minor adjustments were made in other code to compensate for this change.
  • 27% of the CPU used to produce the Activity display in JES2 turned out to be in a routine that manages an internal resource called the “Job Positions Table.”  The algorithm was improved (to work more like its JES3 counterpart) and that routine is no longer a significant CPU contributor.
  • 9% of (E)JES session start-up was a 26-year-old “brute force” prime number generator used to compute the size of a hash table.  That code was totally reworked and now accounts for approximately .02% of session start-up CPU.
  • A 6% performance penalty was observed when sorting a tabular display with a moderate number of rows. The hot spot turned out to be the code that cleared the work area for the sort service to zeros (another MVCL). This overhead was reduced to .04%.

Mea culpa and humility, never a bad thing, but you have to be honest with yourself and ask yourself the right questions!  So going back full circle and quoting Ken Venturi once again, “I don’t believe you have to be better than everybody else.  I believe you have to be better than you ever thought you could be”.  You must draw your own conclusions as to whether such an observation applies to the (E)JES team at Phoenix Software International (PSI)…

Why not ask them yourself?  Ed Jaffe, the (E)JES CTO will be available at the forthcoming UK GSE Annual Conference, 5-6 November 2013, speaking about (E)JES System Management Software: More With Less For Less, For The z/OS Mainframe and z/OS 2.1 User Experiences.

21st Century Mainframe Capacity Planning Requirements

Social Media Sharing

With nearly 5 decades of longevity the IBM Mainframe has changed beyond recognition in terms of CPU capacity and performance capability.  The Capacity Planning discipline for the IBM Mainframe server became more advanced and proactive in the early 1990’s, perhaps coinciding with the introduction of Parallel Sysplex structures associated with the MVS/ESA operating system.  Therefore the requirement to measure and model the impact of workload movement between LPAR and CPC structures became important, if not mandatory.

The fundamental building-block for Mainframe CPU usage analysis is SMF Type 7n records (I.E. RMF or CMF), where this data was typically processed by MXG, MICS and maybe CIMS (acquired by IBM), generally using SAS for reporting purposes.  Other tools, including but not limited to, BEST/1 (acquired by BMC) and PERFMAN (acquired by ASG) also offered capacity planning and performance management solutions.  Therefore, for 20+ years the fundamental Mainframe CPU usage data and associated tools have remained largely the same.  However, maybe the IBM Mainframe server has changed, both in terms of underlying CPU chip technology and customer workload deployment…

I often hear capacity planners state something along the lines of “I can report on the past with 100% accuracy, but predicting the future might prove to be a little more difficult”!  Once again, going back to the early 1990’s, the IBM Mainframe had a typical if not generic workload profile deployment, namely On-Line Transaction Processing (E.g. CICS, IMS DC) and related Database Management Subsystems (E.g. DB2, IMS DB) with Batch Processing.  This somewhat limited workload profile simplified the Capacity Planning process, applying estimates of growth based on current usage.  However, when the Mainframe became more pervasive, taking on new workloads, how was the capacity planner supposed to estimate CPU requirements for their new business application workload?

IBM introduced the Large Systems Performance Reference (LSPR) methodology, designed to provide relative processor capacity data for IBM System/370, System/390 and z/Architecture processors.  All LSPR data is based on a set of measured benchmarks and analysis, covering a variety of System Control Program (SCP) and workload environments.  LSPR data is intended to be used to estimate the capacity expectation for a production workload when considering a move to a new processor.  Although LSPR data is provided on an “as is” basis, with no warranty, it at least provides the Mainframe Capacity Planner with some insight into their CPU sizing challenge.  For many years, LSPR provided the only other data source, as well as RMF (CMF) for Mainframe CPU sizing.  However, is there a more accurate data source, perhaps based on real-life customer data?

With the introduction of the IBM System z10 server (February 2008), a new function CPU MF (CPU Measurement Facility) was incorporated.  Let’s not forget, z10 is now an n-2 technology, having been superseded by the z196/z114 and the latest zBC12/zEC12 generation of servers.  So each and every committed Mainframe customer should be positioned to benefit from the CPU MF function.

CPU MF provides optional hardware assisted collections of information about logical CPU activity executed over a specified interval in selected Logical Partitions (LPARs).  The CPU MF counters function is intended to be run on a constant basis to collect long-term performance data (I.E. SMF Record 113), in a similar manner to how you collect other performance data.  Therefore this data source can be deployed to further refine the accuracy of Mainframe CPU capacity planning projections.  Let’s not forget:

The primary on-going requirement for Mainframe Capacity Planning is to minimize any over or under capacity provision from forecast predictions, used for Mainframe server acquisition purposes”

Mainframe chip technology has also changed in complexity, especially with the latest iterations of CPU chips associated with the z10 server (E.g. POWER 6) onwards, incorporating many layers of cache memory.  Workload capacity performance will be quite sensitive to how deep into the memory hierarchy the processor must go to retrieve the workload’s instructions and data for execution.  Best performance occurs when the instructions and data are found in the cache(s) nearest the processor so that little time is spent waiting prior to execution; as instructions and data must be retrieved from farther out in the hierarchy, the processor spends more time waiting for their arrival.

As workloads are moved between processors with different memory hierarchy designs, performance will vary as the average time to retrieve instructions and data from within the memory hierarchy will vary.  Additionally, once on a processor this component will continue to vary significantly as the location of a workload’s instructions and data within the memory hierarchy is affected by many factors including; locality of reference, IO rate, competition from other resources (E.g. Applications, LPARs, et al), and so on…

The most performance sensitive area of the memory hierarchy is the activity to the memory nest, namely, the distribution of activity to the shared caches and memory.  IBM introduced new terminology, namely Relative Nest Intensity (RNI), indicating the level of activity to this part of the memory hierarchy.  Using data from CPU MF, the RNI of the workload running in an LPAR may be calculated.  The higher the RNI, the deeper into the memory hierarchy the processor must go to retrieve the instructions and data for that workload.

Therefore the Mainframe Capacity Planner does have various data sources available to forecast how an existing or new workload might perform on an upgraded processor (CPC), further refining their CPU capacity requirement forecast.  As always, the final stage in a Mainframe Capacity Planning process is to input the forecast data into the IBM Processor Capacity Reference (zPCR) tool, to determine the exact model and associated resource configuration options for their unique business workload mix.

To summarize, does your Mainframe Capacity Planning process incorporate all of these CPU sizing data sources, in an easy-to-use and cost efficient manner?

Founded by former IBM staffers and capacity planning and performance management industry veterans William Shelden, PhD, and William Hart, PerfTechPro is designed to deliver sophisticated, affordable, easy-to-use solutions for IT management professionals looking for fast, insightful help without high-cost, complex and time-consuming purchasing and licensing requirements.

PerfTechPro for z/OS is a Capacity Planning and Performance Measurement tool specifically designed for the cost conscious and savvy 21st Century data centre.  PerfTechPro for z/OS is the next evolution in Mainframe Capacity Planning tools, having been architected from ground zero using the latest techniques.  PerfTechPro for z/OS provides sophisticated capacity and performance management capabilities, affordable by any sized data centre:

  • Clean, intuitive, easy-to-use interface and graphical representations, for example:
    • Consolidated instance lists guide users to make informed selections
    • Descriptive dialog boxes detail your configuration
    • Anticipates, pre-loads data to speed retrieval, reporting and analysis
    • Automated data management
  • Forecasting and modelling
  • Non-proprietary database, enabling data use outside of PerfTechPro
  • Capable of automated collection, analysis and reporting of SMF 113 records produced by the IBM CPU Measurement Facility (CPU MF)
  • Supports measurement, management of zAAP & zIIP Specialty Engines
  • Automated analysis and management of all key capacity and performance metrics, for example:
    • GPP Utilization of All LPARs
    • MIPS Usage by CPU
    • DASD Response Times
    • Address Spaces Dispatched and Waiting 

PerfTechPro for z/OS also simplifies the data management process associated with Mainframe Capacity Planning.  Using a streamlined process on the z/OS host, PerfTechPro extracts and formats the data required from various SMF sources (E.g. SMF Type 7n, Type 113); delivering an optimized Performance Data Base (PDB) for use by the Windows based GUI.  This optimized file safeguards fast processing during the reporting and forecasting activities, while simplifying any data aggregation processes (E.g. Weekly, Monthly, et al).  Moreover, PerfTechPro allows this data to be stored in non-proprietary (E.g. Microsoft Access, SQL Server, MySQL, Oracle) and multiple database structures, as and if required.

PerfTechPro for z/OS is a simple-to-use and cost-efficient solution, allowing customers to quickly save time and money from their Capacity Planning and Performance Measurement solution.  Ultimately the bottom line objective for PerfTechPro for z/OS is to provide a best-of-breed solution for a very competitive cost. PerfTechPro for z/OS delivers business value by:

  • Ensuring enterprise zSeries Mainframe server resources are being used efficiently
  • Maximizing opportunities for cost-savings
  • Anticipating & responding to increased demand on resources
  • Reducing costs by exploiting periods of lower resource demand
  • Discerning underlying causes of performance and capacity issues
  • Eliminating time-consuming manual tracking, recording and analysis
  • Implementing disciplined management of valuable business resources

In conclusion, the Mainframe Capacity Planning process continues to evolve, forever striving to reduce any discrepancies in CPU requirements forecasting, which of course, have a high associated cost consideration.  Integrating CPU MF (SMF Type 113) must be a mandatory requirement, safeguarding that CPU Sizing, Forecasting, Modelling and Correlation Analysis activities are optimized.  Additionally, the actual process of Mainframe Capacity Planning is an activity that requires great skill and considerable associated responsibility.  A modern day solution such as PerfTechPro for z/OS is worthy of consideration, having been designed by a team with a heritage in delivering Mainframe Capacity Planning solutions, architecting function compatible with modern day functionality, while considering the latest technology zSeries CPU chip design considerations.

Application Performance Tuning – Why Bother?

Social Media Sharing

With older generations of Mainframe Operating Systems, certainly MVS/XA and perhaps MVS/ESA, application performance tuning was a necessity, not an afterthought.  Quite simply, the cost of Mainframe resources, namely CPU, memory and disk, dictated that your mission critical business application might not perform to business requirements, unless you tuned your programming code.  Programmers, both of the system and application variety understood the bits and bytes of available programming languages (E.g. ASM, COBOL, PL/I) and Operating System (I.E. MVS), collaborating either via proactive process, or reactive problem solving.  With the continuing reduction of IT hardware component costs, the improvement in Operating Systems (E.g. 64-bit architecture) and newer programming languages (E.g. C, C++), it seems that application performing tuning is somewhat of an afterthought, but at what cost?

We all know that the cost of a Mainframe MIPS is significant, and although it might have reduced dramatically from a hardware viewpoint, from a software viewpoint, the cost remains largely static at ~£1,500-£3,500, per year, depending on your configuration.  So if your applications are burning several hundred if not several thousand extra MIPS unnecessarily, that’s very expensive indeed!  Additionally and just as importantly, a badly tuned system will manifest itself in slower transaction response times and longer batch jobs, if applicable, which could impact service availability.  So why is there a seeming reluctance to tune business applications, Mainframe resident or not?

If ever there was a functional IT area where the skills gap has never been wider, then application performance tuning is said skill, when comparing the salty old sea dog Mainframe dinosaur, with the newer Mainframe technician!

From an application development process viewpoint, where does the application performance tuning task live; before or after implementation?  The cynical amongst us will know; if it’s after implementation, there’s a strong likelihood said activity will never be performed!  If it’s before implementation, how many projects incorporate a meaningful stress test, or measure transaction response times versus an SLA or KPI metric?  Additionally, if the project is high-priority and/or running behind schedule, then performance testing is an activity that is easily removed…

Back in the good old days, the late 1980’s to early 1990’s, some application performance tuning tools did start to emerge, most notably Strobe.  Strobe was useful to even the most accomplished of system and application programmer personnel, and invaluable to less experienced personnel, and so arguably Strobe became the de facto software tool for tuning Mainframe applications.  However, later releases of MVS (E.g. OS/390 and z/OS), the non-event that was the Year 2000 (Y2K), seemed to remove the focus on and importance of application tuning.

Arguably most importantly of all, that software MIPS cost item, where Strobe and its competitors (E.g. ASG/BMC TriTune, CA Application Tuner, IBM APA, Macro4 ExpeTune, et al) will utilize even more CPU to capture diagnostic trace information, contributed to the demise of application performance tuning.  However, those companies that have undertaken such application tuning activities in the last decade or so are sitting pretty, having reduced the CPU (MIPS) resource consumed, lowering TCO and optimizing performance accordingly.  In the 21st Century, these software solutions are classified as Application Performance Management (APM) solutions.

Is there a better and easier way to stimulate an interest in the application performance tuning discipline?  If the desire exists to tune an application, lowering CPU MIPS usage, optimizing service performance, then the traditional tools and methods mentioned previously exist, but perhaps a new (or not so new) CPU performance data source exists…

With the introduction of the z10 server, a new function CPU MF (CPU Measurement Facility) was incorporated.  Let’s not forget, z10 is now an n-2 technology, having been superseded by the z196/z114 and the latest zBC12/zEC12 generation of servers.  So each and every committed Mainframe customer should be positioned to benefit from the CPU MF function.

CPU MF provides optional hardware assisted collections of information about logical CPU activity executed over a specified interval in selected Logical Partitions (LPARs).  The CPU MF counters function is intended to be run on a constant basis to collect long-term performance data (I.E. SMF Record 113), in a similar manner to how you collect other performance data.  I have previously briefly discussed how CPU MF SMF data can be used to increase Mainframe Server Capacity Planning efficiencies. 

The CPU MF sampling function is a short duration, precise function that identifies where CPU resources are being used, to help you improve application efficiency.  Put very simply, CPU MF sampling data has minimal CPU overhead (E.g. ~0.1-1.0%) when collecting data (I.E. z/OS Hardware Instrumentation Services – HIS), but this data can then be used to identify CPU “hot spots”, which can then be further analysed to identify the “areas of code” generating the high CPU usage.  However, it was forever thus, whether an APM tool, or CPU MF sampling data, high CPU usage can be identified, but the application programmer must undertake the task of optimizing the application code!

IBM have done a great job in providing CPU MF counters data, optimizing the Capacity Planning process with the SMF 113 record, and the realm of possibility exists with the sample data, but a software solution is required to analyse and summarize this data.

Currently there are very few if only one software solution that analyses CPU MF sample data, namely zHISR from Phoenix Software International.  zHISR interfaces directly with z/OS Hardware Instrumentation Services to collect data for hotspot analysis of customer, vendor, or operating system program execution.  zHISR features include:

  • Support for up to 128 simultaneous data collections events.  zHISR collections do not interfere with any HIS functions including sample or counter collection.
  • System console commands for many zHISR functions.
  • An Application Programming Interface to COBOL and Assembler for starting and stopping data collections. Collection lengths for API generated collections have a time range of one second or more.
  • Ability to schedule a collection with JCL so that collection starts when a given job or step begins.
  • Ability to store data collections as z/OS data sets or UNIX files.
  • Support for collections against CICS/TS transactions.
  • Analysis based on a time range within the collected data for a narrower spotlight on problem code.

An intuitive ISPF dialog allows the user to easily produce a CPU hot spots analysis, which can then be used for identifying the offending code sections.  The user can then drill down and highlight the high CPU CSECT and program offset (instruction), comparing with their Associated Data (ADATA), and thus the source programming instruction.  Therefore the skill required to perform analysis is minimal, as is the CPU overhead in collecting analysis data, and so eradicating the potential barriers when embarking on an application tuning initiative.  Furthermore, the actual cost of deploying the zHISR software is not onerous and so perhaps each and every committed Mainframe user can easily include application performance tuning into their application development lifecycle processes. 

zHISR has a UNIX file system interface that lets you navigate the system and browse or delete files.  With zHISR, users can start and stop hardware event data collections and view the status of the current or prior HIS run.  zHISR also includes a memory display/alter utility that lets you view main storage in the CPU you are logged on to.  If zIIPs are present and zHISR is defined as an authorized subsystem, nearly all of the CPU processing used by zHISR is redirected to a zIIP.

There are also instances, however few and far between, where Mainframe customers have written their own proprietary in-house OLTP (On-Line Transaction Processor) and Relational Database Management Subsystem (RDBMS), where traditional APM software tools can’t provide a solution, only interfacing with underlying subsystems (E.g. Adabas, CICS, DB2, IDMS, WebSphere, et al).  In these instances, CPU MF and zHISR offer a solution to help such customers, who probably face challenges when they upgrade their Mainframe servers, safeguarding software and application code is compatible with the new hardware, and ideally, exploits the latest functionality.

In conclusion, application performance tuning has to be a very important if not mandatory activity for the Mainframe Data Centre.  Whether via CPU MF or traditional APM software solutions, the cost reduction and performance improvement benefits of tuning should be compelling reasons to proactively engage in application tuning activities.  From a skills viewpoint, maybe the KISS (Keep It Simple Stupid) principle can apply, where CPU MF collects the data very simply and efficiently, complemented by zHISR, analysing the data in an intuitive and cost optimized manner.

So turning the subject matter on its head, Application Performance Tuning – Why Bother?  Why not!

Further information can be found from my z/OS Application Performance Tuning presentation, delivered at UK GSE in November 2012.

The Problem With Problems – Are You zAware?

Social Media Sharing

Several decades ago and observing potential challenges with hardware, most of us seasoned Mainframe folk would have been familiar with the terms Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR), although repair might become resolution, replacement, and so on.  As hardware has become more reliable, with very few if any single points of failure, we don’t really use these terms for hardware, but perhaps if we don’t use them for problems associated with our business applications, we should…

Today we generally simplify this area of safeguarding business processing metrics (E.g. SLA, KPI) with the Reliability, Availability and Serviceability (RAS) terminology.  So whether hardware related by an IHV such as IBM, or software related by ISV’s such as ASG, BMC, CA, IBM, naming but a few, or application code writers, we’re all striving to improve the RAS metrics associated with our IT discipline or component.

There will always be the ubiquitous software bugs, human error when making configuration changes, and so on, but what about those scenarios we might not even consider to be a problem, yet they can have a significant impact on our business?  An end-to-end application transaction could consist of an On-Line Transaction Processor (E.g. OLTP, CICS, IMS, et al), a Relational Database Management Subsystem (E.g. RDBMS, DB2, ADABAS, IDMS, et al), a Messaging Broker (E.g. WebSphere MQ), a Networking Protocol (E.g. TCP/IP, SNA, et al), with all of the associated application infrastructure (E.g. Storage, Operating System, Server, Application Programs, Security, et al); so when we experience a “transaction failure”, which might be performance related, which component failed or caused the incident?

Systems Management disciplines dictate Mainframe Data Centres deploy a plethora of monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON, et al), but these software solutions typically generate a significant amount of data, but what we really need for successful problem solving is the right amount of meaningful information.

So ask yourself the rhetorical question.  You know it; how many application performance issues remain unsolved, because we just can’t identify which component caused the issue, or there is just too much data (E.g. System Monitor Logs) to analyse?  If you’re being honest, I guess the answer is greater than zero, perhaps significantly greater.  Further complications can occur, because of the collaboration required to resolve such issues, as each discipline, Transaction, Databases, Messaging, Networking, Security, General Systems Management, Performance Monitoring, typically reside in different teams…

IBM System z Advanced Workload Analysis Reporter (IBM zAware) is an integrated, self-learning, analytics solution for IBM z/OS that helps identify unusual system behaviour in near real time.  It is designed to help IT personnel improve problem determination so they can restore service quickly and improve overall availability.  zAware integrates with the family of IBM Mainframe System Management tools, including Runtime Diagnostics, Predictive Failure Analysis (PFA), IBM Health Checker for z/OS and z/OS Management Facility (z/OSMF).

IBM zAware runs in an LPAR on a zEC12 or later CPC.  Just like any System z LPAR, IBM zAware requires processing capacity, memory, disk storage, and connectivity.  IBM zAware is able to use either general purpose CPs or IFLs, which can be shared or dedicated.  It is generally more cost effective to deploy zAware on an IFL.

Used together with other Mainframe System Management Tools, zAware provides another view of your system(S) behaviour, helping answer questions such as:

  • Are my systems showing abnormal message activity?
  • When did this abnormal message activity start?
  • Is this abnormal message activity repetitive?
  • Are there messages appearing that have never appeared before?
  • Do the times of abnormal message activity coincide with problems in the system?
  • Is the abnormal behaviour limited to one system or are multiple systems involved?

IBM zAware creates a model of the normal operating characteristics of a z/OS system using message data captured from OPERLOG.  This message data includes any well-formed message captured by OPERLOG (I.E. A message with a tangible Message ID), whether it is from an IBM product, a non-IBM product, or one of your own application programs.  This model of past system behaviour is used as the base against which to compare message patterns that are occurring now.  The results of this comparison might help answer these questions.

IBM zAware determines, using its model of each system, what messages are new or if messages have been issued out of context based on the past normal behaviour of the system.  The model contains patterns of message ID occurrence over a previous period and does not need to know what job or started task issued the message. It also does not need to use the text of a message.

In summary, zAware is a self-learning technology, for newer zSeries Servers (I.E. zEC12 onwards), which can help reduce the time to identify the “area” of where a problem occurred, or is occurring (E.g. Near Real-Time), allowing a technician to fully identify the problem diagnosis and consider potential resolutions.  Put very simply, zAware will assist in identifying the problem, but it does not fully qualify the problem and associated resolution.  This is a good quality, as ultimately the human technician must complete this most important of activities!

So what if you’re not a zEC12 user or you’re concerned about increased costs because you don’t deploy IFL speciality engines?

ConicIT/MF is a Proactive Performance Management for First Fault Performance Problem Resolution solution.  By interfacing with standard system monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON), ConicIT/MF uses sophisticated mathematic models to perform proactive, intelligent and significant data reduction, quickly highlighting possible causes of problems, allowing for efficient problem determination.  Put another way, Systems Management Performance Monitors provide a wealth of data, but sometimes there’s too much data and not enough information.  ConicIT safeguards that the value of the data provided by Systems Management Performance Monitors is analyzed and consolidated to expedite performance problem resolution.

ConicIT runs on a distributed Linux system external to the Mainframe system being monitored.  ConicIT is a completely agentless architecture which doesn’t require installation on the Mainframe system being monitored.  It receives data from existing monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON, et al), through their standard interfaces.  3270 emulation enables ConicIT to appear as just another operator to the existing monitor and adds no more load to the monitored system than would adding an additional human operator.

Until a problem is predicted ConicIT requests basic monitor information at a very low frequency (about once per minute), but if the ConicIT analysis senses a performance problem brewing, its requests for information increase, but never so much as to effect the monitored system.  The maximum load generated by ConicIT is configurable and ConicIT supports all the major Mainframe monitors.

The monitor data stream is retrieved by parsing the data from the various (E.g. Log) data sources.  This raw data is first sent to the ConicIT data cleansing component.  Data from existing monitors is very “noisy”, since various system parameters values can fluctuate widely even when the system is running perfectly.  The job of the data cleansing algorithm is to find meaningful features from the fluctuating data.  Without an appropriate data cleansing algorithm it is very difficult or impossible for any useful analysis to take place. Such cleansing is a simple visual task for a trained operator, but is very tricky for an automated algorithm.

The relevant features found by the data cleansing algorithm are then processed to create appropriate variables.  These variables are created by a set of rules that can process the data and apply transformations to the data (E.g. combine single data points into a new synthesized variable, aggregate data points) to better describe the relevant state of the system.

These processed variables are analyzed by models that are used to discover anomalies that could be indicative of a brewing performance problem.  Each model looks for a specific type of statistical anomalies that could predict a performance problem.  No single model is appropriate for a system as complex as a large computing system, especially since the workload profile changes over time.  So rather than a single model, ConicIT generates models appropriate to the historical data from a large, predefined set of prediction algorithms.  This set of active models is used to analyze the data, detect anomalies and predict performance degradation.  The active models vote on the possibility of an upcoming problem in order to make sure that as wide a set of anomalies as possible are covered, while lowering the number of false alerts.  The set of active models change over time based on the results of an offline learning algorithm which can either generate new models based on the data, or change the weighting of existing models.  The learning algorithm is run in the background on a periodic basis.

When a possible performance problem is predicted by the active models, the ConicIT system takes two actions.  It sends an alert to the appropriate consoles and systems, and also instructs the monitor to collect information from the effected systems more frequently.  The result is that when IT personnel analyze the problem they have the information describing the state of the system and the effected system components as if they were watching the problem while it was happening.  The system also uses the information from the analysis to point out the anomalies that led the system to predict a problem, thereby aiding in root cause analysis of the problem.

So whether zAware or ConicIT, there are solutions to assist today’s busy IT technician to improve the Reliability, Availability and Serviceability (RAS) metric for their business, by implementing practicable resolutions for those problems, which previously, were just too problematic to solve.  zAware can offload its processing to an IFL, as and if available, whereas ConicIT performs its processing on a Non-Mainframe platform, and thus can support all zSeries Servers, not just the zEC12 platform.

Ultimately both the zAware and ConicIT solutions have the same objective, increasing Mean Time Between Failure (MTBF) and decreasing Mean Time To Resolution (MTTR), optimizing IT personnel time accordingly.

Flash Express – Back To The Future

Social Media Sharing

It’s just not science fiction films that get rebooted or reimagined, the same thing happens in the technology world, although not always as obviously.  Over the years we have seen Star Trek, Star Wars, and numerous superhero stories be updated for current day audiences, some better than others, and similarly, flash memory or Solid State Drives (SSD) is not a new concept in the Mainframe world.  For me, I wonder whether the Back To The Future films might be rebooted, and if they do, hopefully it’s done well!  Anyway, I digress, so back to the Mainframe flash memory observations… 

Flash Express is a new feature for the zEC12 server, designed to help drive System availability and performance to even higher levels. 

Flash Express cards are delivered as a RAID 10 mirrored pair for superior resiliency and reliability.  In the unlikely event of device failure, Flash Express cards can also be concurrently replaced.  The cards are designed for superior wear levelling and have a long expected lifetime.  From a security perspective, Flash Express stored data remains protected.  Data is encrypted on the Flash Express adapter with 128-bit AES encryption.  Encryption keys are stored on smart cards that are plugged into the SE.  Removing the smart cards renders the data on the card inaccessible. 

In the first instance, IBM seems to be targeting Systems Paging (I.E. Auxiliary Storage Manager – ASM) and System Dumps (E.g. SVC, Standalone) as an introduction to this new level of storage hierarchy. 

Flash Express reduces latency for critical system paging that might otherwise impact the availability and performance.  For system paging flexibility and efficiency, Flash Express is a higher performance option when compared with traditional auxiliary storage (I.E. Disk).  z/OS uses both Flash Express and page data sets for auxiliary storage by paging data to the preferred storage medium first, based on response times, data set characteristics, and other parameters.  Wherever possible, the system will page first to Flash Express resulting in faster performance.  Especially for data intensive applications the use of Pageable Large Pages with Flash Express enables the transfer of large amounts of data at faster speeds, which can result in improved performance for DB2 analytic workloads.

NB. Because Flash Express is not persistent across IPL events, it cannot be used for Virtual I/O or PLPA data used in warm starts.  VIO and PLPA datasets must still be defined on DASD.

During diagnostic collection, as in SVC or standalone dumps, IBM states that systems can become sluggish effectively rendering key systems unavailable.  When data is transferred into main memory as part of a dump event, the fast I/O rates and low latency associated with Flash Express provide decreased first failure data capture time, and faster page-ins of the critical pages needed for dump creation.  This allows the system to return to normal workload performance faster, without incurring extra delays.

So clearly, adding a flash memory layer to the Mainframe storage hierarchy can only deliver benefit, and over time, seemingly Flash Express can be used for other I/O intensive applications, further improving response times and decreasing associated CPU cycles.

No doubt the non-Mainframe technician might say “so what, we’ve been doing flash and SSD for nearly 20 years.  Typical Mainframe, always behind the rest of the IT world”!

Hmmm, perhaps not, hence the Back To The Future reference.  Depending on your viewpoint, and perhaps urban myth, from a release date viewpoint, but in the mid-to-late 1980’s, StorageTek introduced a device called the 4080, AKA Control Data Set Manager (CDSM) for their Host Software Component (HSC) product.  You will still see this 4080 device referenced in the HSC Systems Programmer’s Guide technical manual.  The 4080 is a Solid State Device, utilizing memory storage, emulating 3380 and 3390 device types, allowing z/OS (MVS) data sets to be created and accessed, as and if required.  Arguably if not certainly, this was the first instance of logical and physical device separation for Mainframe disks, and urban myth might dictate that this StorageTek 4080 technology was used or at least borrowed for the first EMC Symmetrix disk subsystem…

StorageTek typically packaged this 4080 device within a NearLine (ATL) solution sale and customers used this SSD device for HSC Control Data Sets, but also other high I/O files, such as the IMS Write-Ahead Data Set (WADS).  Therefore, Flash Express isn’t the first SSD solution for the Mainframe, and similarly, Flash Memory/Cards, USB Sticks, et al, aren’t the first SSD type technology in the IT world.

As somebody (Machiavelli) far wiser than I once said “whoever wishes to foresee the future must consult the past”.  The longevity of the Mainframe, nearly 50 Years of technical innovation, largely dictates that technological ideas are at least borrowed from the Mainframe, whether System Paging, Fibre Channels, Storage Area Networking (SAN), Virtual Storage, Flash/SSD Memory, naming but a few.

Should today’s zEC12 Mainframe customer deploy Flash Express?  Without doubt, but to deliver maximum ROI and benefit, it should be considered as more than a faster paging and dump solution, and we look forward to how IBM will further enhance this product offering.

A Tale of Two Twittees

Social Media Sharing

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us…

OK, so I’m not Charles Dickens, but recently I was reminded of this most notable of opening lines from a novel, these incredibly wise words, seemingly timeless, when assisting a recruitment consultant fulfil Mainframe technical positions for one of their clients.

So who are the Two Twittees?  As my social networking is limited to LinkedIn, I’m not sure I understand all the terminology, but my analogy is based upon the inexperienced young entry level recruit to the Mainframe world, and the seasoned Mainframe professional with several decades of real-life vocational experience.  Can and should these vastly different class of technical resources work together in today’s technology driven and seemingly endless social media resource aware world?

The simple answer was forever thus, yes, they must work together, but when we reflect upon that opening line from Charles Dickens, the gap between the young fresher and the seasoned sage perhaps has never been wider.  Why?  Perhaps largely attributable to the question of “common sense”, where perhaps the more experienced sage doesn’t answer questions instantaneously, doesn’t publish all sorts of information on social media sites various, and is used to being a problem solver, and when their career started, perhaps didn’t have instantaneous access to a wealth of electronically stored information (E.g. Technical Manuals, ISV/IHV Problem/Solution Resources, Forums, Blogs, et al).  Conversely the inexperienced fresher has access to all of this information, without having done the job first, and so maybe they can be inclined to think that they can be an expert in weeks and months.  This clearly is not the reality, regardless of IT platform, including the IBM Mainframe.

Recently I observed a quote from such a person posted on LinkedIn, something along the lines of “paper manuals, I heard that they existed, I always thought it was an urban myth”!  Flippant or tongue in cheek maybe, but that’s the way we worked in the 1980’s, paper based, evolving slowly into IBM BookManager resources, initially via PC and CD-ROM!  Additionally, you can witness numerous examples of potential job candidates asking how to prepare for an interview via social networking sites; what crib sheets and FAQ type resources can assist them in gaining a job!  Surely, if you don’t know your subject matter, you perhaps shouldn’t be applying for the job?  However, OK, secure an interview, but “tell the world” that you don’t know your subject matter?  Surely, a modicum of common sense will tell you that this is not the way to inspire confidence in a potential employer, and thus perhaps you’re the one that’s impacting your employment chances, whether as an employee or sub-contractor…

It seems somewhat of a paradox that at 50 years old, most Mainframe people are considered to be too old, maybe being offered early retirement, or perhaps not being considered for any new job positions, because of their age.  One must draw one’s own conclusions accordingly.  Conversely, the younger and eager, perhaps recently graduated student, will be perceived as the future for a potential employer, but they have no experience.  Additionally, such a person might have inadvertently or otherwise harmed their reputation by what they might or might not have posted on social networking sites such as Twitter, Facebook, LinkedIn, et al.  It is somewhat bemusing to this social networking luddite, why any individual, regardless of age, can’t comprehend that recruitment consultants, company Human Resource departments and largely anybody involved in the recruitment/employment process will perform an Internet and social media search to determine the suitability of a candidate.

In this instance, the experience and perhaps inability of a fat fingered mature Mainframe person, who perhaps can’t or won’t tweet, text, connect, uses their common sense, does their job, and in fact their experience, both as a human being and a Mainframe technician, is best deployed by passing on these attributes to the new Mainframe fresher’s.  Does this always happen?  Once again, ad nauseam, if the experienced Mainframe resource is dismissed or overlooked at the age of 50 or older, how can they pass on such experience?

Shortly and on 16th May 2013, the UK GSE 101 working group is having its inaugural working group meeting at the University of Bedfordshire, Luton.  The UK GSE 101 working group is a new group aimed at those new to system z, discussing a wide range of topics for those new to the environment.  What a fantastic stake in the ground for the IBM Mainframe newbie/fresher to meet with peers and industry colleagues, hopefully both young and old.  I wish them well and hope to see this group go from strength to strength over the years.

So, back to my recent activity with the recruitment consultant!  Their thought processes and tick boxes, pseudo or otherwise for vetting people must be recognised.  Whatever age you are, be really careful how you use social networking sites, but this seemingly is more pertinent for the younger person.  The older person, and seemingly being 50 or over, does have a major challenge in the employment world, but perhaps their best opportunity is being allowed to transfer their knowledge to newer Mainframe recruits.  Therefore, perhaps they have to evolve, participate in movements such as the UK GSE 101 working group, but ideally, employers and those folks in the recruitment industry might look to work together and consider leveraging from the experience of more mature IT personnel, expediting the training of Mainframe fresher’s, but more importantly, transfer their “common sense” aptitude…

So even in 1859, Charles Dickens knew exactly what he was talking about, two sides to the coin, yin and yang, and so on.  For some it can be the best of times, and for some it could be the worst of times, but by combining youth with experience, perhaps these current times can be good for all age demographics in the Mainframe workplace, where everybody wins, and yes that does include the Employer!

FICON (Fibre Connection Channel): 15 Years of Mainframe I/O Improvements

Social Media Sharing

In 1998, IBM introduced FICON channels for enhanced I/O connectivity and performance for their 9672 G5 processors, delivering significant capability when compared to its predecessor, ESCON.  Let’s not forget that ESCON (Enterprise Systems – S/390) was the first iteration of Fibre Channel for the IBM Mainframe, delivering significant capability, when compared with the previous technology of heavy, large and costly copper based bus & tag parallel (S/370) channels.

ESCON channels were first introduced in the early 1990’s, but after less than a decade, the data and associated storage device explosion was exposing the technical capabilities of ESCON, for example:

  • Mainframe Server Channel Support: One IBM Mainframe processor could only support 256 ESCON channels, whereas FICON was offering a ~5-8:1 reduction in channel requirements.  Put another way, a customer could expect to consolidate the number of channels required from ~200 ESCON to ~30-40 FICON.
  • Device Support: One ESCON channel could support up to 1024 devices (sub-channel/device numbers), channel, whereas a 9672 FICON channel increased support by 16 fold, up to 16,384 (16 K) devices.
  • Distance: The performance of ESCON dropped off significantly when the distance between the channel and associated Control Unit was greater than ~9 KM.  FICON increased this distance separation to ~100 KM, paving the way for the Geographically Dispersed Parallel Sysplex (GDPS) topologies we take for granted today.
  • Performance: ESCON performance was limited to 17 MB/S, whereas the first evolution of FICON channels delivered 100 MB/S full-duplex performance.

Clearly the first iteration of FICON technology delivered significant benefit to the IBM Mainframe User, and arguably is the primary Mainframe evolution that has sustained data growth and the adoption of Disaster Recovery and Business Continuity resiliency.  So, what does FICON offer today, some 15 years later?

Just as FICON superseded ESCON, FICON Express has now superseded FICON, offering a technology base that can continue to deliver benefit for many years to come.  FICON Express continues the tradition of offering more capabilities with each new generation of FICON channel.  The features were designed with the future in mind, while remembering the past, by supporting the data serving leadership of System z and enabling improved data access using High Performance functions (I.E. zHPF), while providing backwards compatibility, being able to auto-negotiate the link data rates of 2, 4 or 8 Gbps, namely the various FICON Expressn iterations (2/4/8).

High Performance FICON for System z (zHPF) is a data transfer protocol that is optionally deployed for accessing data from IBM Mainframe compatible storage subsystems (E.g. IBM DS8000, EMC Symmetrix V-Max, HDS USP, et al) and other subsystems.  Initially the data types supported were DB2, PDSE, VSAM, zFS and Extended Format SAM, and more latterly, legacy access methods including QSAM, BPAM and BSAM are now supported.  zHPF leverages the potential of FICON channels to deliver significant performance enhancements, and can help reduce the infrastructure costs for System z I/O by efficiently utilizing I/O resources, minimizing CHPID (Channels), Fiber (Cables), Switch Ports (E.g. Cisco, Brocade) and Control Unit (E.g. Disk Subsystem) resource requirements.  zHPF also compliments the Extended Address Volumes (EAV) strategy for growth, increasing I/O rate capability as the associated disk volume size increases.

The latest generation FICON Express8S channel has two possible modes of operation designed for connectivity to servers, switches/directors, disks, tapes and printers:

  1. CHPID Type FC: FICON, zHPF, and channel-to-channel (CTC) traffic for the z/OS, z/VM, z/VSE, z/TPF, and Linux on System z environments
  2. CHPID Type FCP: Fibre Channel Protocol (FCP) for attachment to SCSI devices for the z/VM, z/VSE, and Linux on System z environments

With FCP channel full fabric support, multiple switches/directors can be placed between the System z server and SCSI device, allowing many “hops” through a storage area network (SAN) and providing improved utilization of intersite-connected resources and infrastructure.  This may help to provide more choices for storage solutions or the ability to use existing storage devices and can help facilitate the consolidation of Distributed Systems servers (E.g. UNIX, Wintel) onto System z servers, protecting investments in SCSI-based storage.

I/O performance improvement rates for the initial iterations of FICON when compared to ESCON and then FICON Express when compared to FICON, and more latterly zHPF have been significant.  Using like-for-like benchmark performance studies, we can see significant performance improvements:

I/O Driver @ 4K Block Size – ~ I/Os Per Second

Channel Type

#I/Os per Sec

n:1 Increase

ESCON

1200

N/A

FICON Express
2/4 Native

14000

11.7

FICON Express
2/4 zHPF

31000

2.3

FICON Express
8 Native

20000

1.5

FICON Express
8 zHPF

52000

2.6

FICON Express
8S Native

23000

1.2

FICON Express
8S zHPF

92000

1.8

NB. Maximum performance is server related (E.g. z10, z114, z196, zEC2).

Compared to ESCON, the latest 8 Gbps FICON channel leveraging from zHPF function delivers ~76 times more I/O throughput compared to ESCON, while significantly increasing throughput, by at least 50% from generation to generation.

I/O Driver Mixed Read/Write – ~ MBs Per Second

Channel Type

#MBs per Sec

n:1 Increase

ESCON

12

N/A

FICON Express
4 Native

350

29.2

FICON Express
4 zHPF

620

1.8

FICON Express
8 Native

620

1.8

FICON Express
8 zHPF

770

1.3

FICON Express
8S Native

620

1.0

FICON Express
8S zHPF

1600

2.1

NB. Maximum performance is server related (E.g. z10, z114, z196, zEC2).

Compared to ESCON, the latest 8 Gbps FICON channel leveraging from zHPF function delivers ~133 times more I/O throughput compared to ESCON, while significantly increasing throughput, by at least 100% from generation to generation.

Once again, the backwards compatibility capability of the IBM Mainframe server is highlighted by the evolution of the FICON channel, and in particular, Disk Subsystems IHV’s, obviously IBM themselves, but notably EMC, HDS and Oracle (StorageTek) in evolving their offering to support the latest FICON technologies.

We sometimes might take for granted how much data can be stored by a single footprint IBM Mainframe and how much performance and throughput capability is available to process this data.  However, we shouldn’t under estimate what role FICON has played in allowing this significant data (I/O) processing capability to grow, often rapidly, sometimes exponentially.

If there is a downside, such performance attributes might have eradicated the skills required to tune I/O subsystems, but that’s perhaps a subject matter for another day…

IBM Mainframe: Workload License Charges (WLC) Pros & Cons

Social Media Sharing

It is estimated that less than half of eligible IBM Mainframe customers deploy the VWLC pricing mechanism, which in theory, is the lowest cost IBM software pricing metric.  Why?  In the first instance, let’s review the terminology…

Workload License Charges (WLC) is a monthly software license pricing metric applicable to IBM System z servers running z/OS or z/TPF in z/Architecture (64-bit) mode.  The fundamental ethos of WLC is a “pay for what you use” mechanism, allowing a lower cost of incremental growth and the potential to manage software cost by managing associated workload utilization.

WLC charges are either VWLC (Variable) or FWLC (Flat).  Not all IBM Mainframe software products are classified as VWLC eligible, but the major software is, including z/OS, CICS, DB2, IMS and WebSphere MQ, where these products are the most expensive, per MSU.  What IBM consider to be legacy products, are classified as FWLC.  More recently a modification to the VWLC mechanism was announced, namely AWLC (Advanced), strictly aligned with the latest generation of zSeries servers, namely zEC12, z196 and z114.  For the smaller user, the EWLC (Entry) mechanism applies, where AEWLC would apply for the z114 server.  There is a granular cost structure based on MSU (CPU) capacity that applies to VWLC and associated pricing mechanisms:

Band MSU Range
Base 0-3 MSU
Level 0 4-45 MSU
Level 1 46-175 MSU
Level 2 176-315 MSU
Level 3 316-575 MSU
Level 4 576-875 MSU
Level 5 876-1315 MSU
Level 6 1316-1975 MSU
Level 7 1976+ MSU

Put simply, as the MSU band increases, the related cost per MSU decreases.

IBM Mainframe users can further implement cost control by specifying how much MSU resource they use by deploying Sub-Capacity and Soft Capping techniques.  Defined Capacity (DC) allows the sizing of an LPAR in MSU, and so said LPAR will not exceed this MSU amount.  Group Capacity Limit (GCL) extends the Defined Capacity principle for a single LPAR to a group of LPARs, and so allowing MSU resource to be shared accordingly.  A potential downside of GCL is that is one LPAR of the group can consume all available MSU due to a rogue transaction (E.g. loop).

Sub-Capacity software charges are based upon LPAR hardware utilization, where the product runs, measured in hourly intervals.  To smooth out isolated usage peaks, a Rolling 4-Hour Average (R4HA) is calculated for each LPAR combination, and so software charges are based on the Monthly R4HA peak of appropriate LPAR combinations (I.E. where the software product runs) and not based on individual product measurement.

Once a Defined Capacity LPAR is deployed, this informs WLM (Workload Manager) to monitor the R4HA utilization of that LPAR.  If the LPAR R4HA utilization is less than the Defined Capacity, nothing happens.  If the LPAR R4HA utilization exceeds the Defined Capacity, then WLM signals to PR/SM and requests that Soft Capping be initiated, constraining the LPAR workload to the Defined Capacity level.

If a user chooses a Sub-Capacity WLC pricing mechanism, they will be required by IBM to submit a monthly Sub-Capacity Reporting Tool (SCRT) report.  Monthly WLC invoices are based upon hourly utilization metrics of LPAR hardware utilization, where the software product executes.  The cumulative R4HA and bottom line WLC billing metric is calculated for each product and associated LPAR group and not based on individual product measurement.

Bottom Line: From a Soft Capping viewpoint, the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So whether a customer uses Soft Capping or not, in all likelihood, there will be occasions when their workload R4HA is lower than their zSeries server MSU capacity.

So, at first glance, VWLC seems to provide a compelling pricing metric, based upon Sub-Capacity and a pay for what you use ethos, and so why wouldn’t an IBM Mainframe user deploy this pricing metric?

The IBM Planning for Sub-Capacity Pricing (SA22-7999-0n) manual states “For IBM System z10 BC and System z9 BC environments, and z890 servers, EWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 114, environments, AEWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 196, System z10 EC and System z9 EC environments, and other zSeries servers, Sub-Capacity pricing is cost-effective for many, but not all, customers.  You might even find that Sub-Capacity pricing is cost effective for some of your CPCs, but not others (although if you want pricing aggregation, you must always use the same pricing for all the CPCs in the same sysplex)”.

Conclusion: For all small Mainframe users qualifying for the EWLC (AEWLC) pricing metric, arguably this pricing mechanism is mandatory.  For the majority of larger Mainframe users, the same applies, although a granularity of adoption might be required.  IBM also have a disclaimer “Once you decide to use Sub-Capacity pricing for a specific operating system family, you cannot return to the alternative pricing methods for that operating system family on that CPC.  For example, once you select WLC you may not switch back to PSLC without prior IBM approval”.  However, the requisite contractual exit clause option does exist; the customer can switch back to the PSLC pricing metric.

Some IBM Mainframe users might object to a notion of Soft Capping, relying upon their tried and tested methodology of LPAR management via the number of CPs allocated and associated PR/SM Weight.  This is seemingly a valid notion and requirement, prioritizing performance ahead of cost optimization.

Conclusion: As previously indicated, with VWLC, SCRT invoices are generated upon a premise of the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So the VWLC pricing mechanism should deliver a granularity of cost savings, typically higher for a Soft Capping environment.

Some IBM Mainframe users might just believe that nothing can match their Parallel Sysplex Licensing Charge (PSLC) mechanism, first available in the late 1990’s, which might be attributable to other 3rd party ISV’s who cannot and will not allow for their software to be priced on a Sub-Capacity basis.  In reality, adopting the VWLC pricing mechanism delivers ~5% cost savings when compared with PSLC, as indicated by the IBM Planning for Sub-Capacity Pricing Manual (SA22-7999-0n) and related Sub-Capacity Planning Tool (SCPT).

Conclusion: Adopting Sub-Capacity based pricing metrics can only be a good thing.  If your 3rd party ISV supplier doesn’t recognise Sub-Capacity pricing, whether MIPS or MSU based, perhaps you should consider your relationship with them.  Regardless, the z10 server was the last IBM Mainframe to incorporate the “Technology Dividend” solely based on faster CPU chips.  The lower cost WLC pricing metric is now only available with the AWLC and related (E.g. AEWLC) pricing metrics, as per the z196, z114 and zEC12 servers.

Some customers might state that there is a lack of function or granularity of policy definition for IBM supplied Soft Capping (E.g. DC, GCL) or Workload Management (WLM) techniques.  To some extent this is a valid argument, but wasn’t it forever thus with IBM function?  Sub-Capacity implementation is possible via IBM, as is Workload Management (WLM), Soft Capping or not, but should the customer require extra functionality, 3rd party software solutions are available.

The zDynaCap software solution from zIT Consulting delivers a “Capacity Balancing” mechanism, integrating with R4HA and WLM methodologies, but constantly monitoring MSU usage to determine whether CPU resource can be reallocated to Mission & Time Critical workloads, based upon granular customer policies.  The only guarantee in a multiple LPAR environment, for a Mission & Time Critical LPAR to receive all available MSU resource, Soft Capping or not, is to inactivate all other LPARs!  Clearly this is not an acceptable policy for any installation, and so a best endeavours policy applies for PR/SM DC, GCL and Weight settings.

Conclusion: z/OS workloads change constantly, whether the time of day (E.g. On-Line, Batch) or period of the year (E.g. Weekly, Monthly, Quarterly, Yearly) or just by customer demand (E.g. 24 Hour Transaction Application).  Therefore a dynamic MSU management solution such as zDynaCap is arguably mandatory, implementing the optimum MSU management policy, whether for purely performance reasons, safeguarding the Mission & Time Critical workload isn’t impacted by lower priority workloads, or for cost reasons, optimizing MSU usage for the best possible monthly WLC cost.

In conclusion, not considering and arguably not implementing z/OS VWLC related pricing mechanisms is impractical, because:

  • The VWLC and AWLC related pricing metrics deliver the lowest cost per MSU for eligible z/OS software
  • When compared with PSLC, VWLC related pricing mechanisms deliver conservative ~5% cost savings
  • A pay for what you use and therefore Sub-Capacity pricing mechanism, not the installed MSU capacity
  • If extra MSU policy management granularity is required, consider 3rd party software such as zDynaCap

Software cost savings are not just for the privileged; they’re for everyone!

IBM Mainframe – Enterprise Software License Agreements Pros & Cons

Social Media Sharing

An often quoted phrase in the Mainframe user base is “why are our Mainframe software costs so high”?  Sometimes we might have to look closer to home when finding the answers to our questions…

Over the years, Mainframe software portfolios in the customer environment might have become unwieldy, with duplication of software function, unused software, unsupported software products, and so on.  Typically this scenario occurs due to Merger & Acquisition (M&A) activity, where in an ideal world, a standard LPAR (image) with an optimally configured software portfolio would be deployed, which inevitably will generate the requirement for a modicum of migration activity, from one software product to another.  The complexity of software migration can change dramatically from a simple change, generally associated with Systems Management (E.g. Monitors) products to enormously complex, generally involving Database Subsystems (E.g. Adabas, DB2, IDMS, et al) and Programming Languages (E.g. COBOL, PLI, et al) while there is some middle ground with some Systems Management products (E.g. Security, Storage, Scheduling) that maintain metadata (policy data).  Therefore only the truly committed Mainframe user will adopt and fully commit to this standard LPAR methodology, benefitting to some extent from lower software costs.

Similarly over the last 20 years or so, the perceived requirement for Enterprise Software License Agreements has increased, where the fundamental premise is that such agreements make life easier for both the customer and ISV alike.  An interesting notion indeed, and one must draw one’s own conclusions as to whether such a utopia can exist; therefore as always, the caveat emptor (let the buyer beware) term must apply!

However, with such fully encompassing requirements and associated pricing mechanisms, the need for each and every major ISV to have a fully rounded software portfolio has ensued.  Therefore we have witnessed a lot of M&A activity in the Mainframe ISV market place, where several dominant players have emerged, in no particular order, BMC (Advantage), CA (FlexSelect, MLP, OLP) and IBM (ESSO, ELA), while some might say ASG should be included in this list.  Generally it seems to be the norm that each and every Mainframe customer will have at least one Enterprise Software License Agreement in place, typically with IBM because of the need to deploy the z/OS (z/VM, z/VSE, zLinux) operating system, generally in conjunction one other, whether ASG, BMC or CA.

The advantages of an Enterprise Software License Agreement are primarily:

  • Simplified license management via many products from one supplier
  • A several (3-5) year license agreement, only requiring periodic review and negotiation
  • Perceived cost benefit, with discount based upon volume, both in terms of software and CPU power
  • Perceived deployment benefit, treating Distributed and Mainframe platforms equally
  • Simplified support, as each and every software product should have the same look and feel

However, for a balanced review, we must identify the potential disadvantages, for example:

  • Is each and every software product from this single supplier the best for our business?
  • How do we renegotiate this agreement, because our business requirements have unexpectedly changed?
  • How do we exit this agreement, because our relationship with this supplier has failed?
  • How do we calculate a tangible cost and value for each and every product we deploy?

As always, the devil is in the detail, and although most pros and cons seem fairly innocuous at first glance, the considerations generated regarding contract termination or renegotiation are significant.  For example, if the Mainframe user chooses a 3 year Enterprise Software License Agreement, do they need to decide at least 18 Months before contract expiration that they must migrate to alternative software products, to terminate their relationship with a supplier?  So at first glance, volume discount and simplification look good, but how expensive and disruptive will contract termination be?

In real-life human terms, this is somewhat analogous to Marriage, a long-term relationship between two parties that choose to declare significant commitment to one another, but perhaps, the realm of possibility exists that said relationship will fail, and of course, in the absence of a bulletproof pre-nuptial, complications occur, and exit from the relationship is both financially expensive and disruptive.  Hmmm, so where is the equivalent of a pre-nuptial for the Enterprise Software License Agreement?  In an ideal world, the commercially savvy customer will have planned for such a possibility, but whether they have or have not, the supplier will have been paid for their software, and the customer may not have any choice but to renew or extend their agreement!  So which party is the winner and which one is the loser in such a scenario?  Does one party benefit from a heads we win and tails you lose proposition?

How does the Mainframe customer choose the best software product for their business requirement?  In an ideal world, they document their business requirement, collect information on market place offerings, review pricing options, generate a shortlist of suitable products, and eventually choose the “best-of-breed” product.  How is such a structured and balanced approach possible when deploying the Enterprise Software License Agreement?  The first thought must be cost based, as software has already been paid for, so if there’s a product in the portfolio we could use, we need to use it, whether it’s the best product or not.

If a Mainframe user is using an internal chargeback system for computing use, how can they fairly cost the pricing metric, if they don’t know the price of software products used?  Equally, how can the Mainframe user attempt to identify single product pricing when Enterprise Software License Agreements detail no granularity of pricing information?  Perhaps a modicum of research might help, where some global Government regulations dictate that contract details must be published for public scrutiny.  Therefore ISV Mainframe software list pricing details can be identified, for example, IBM and BMC.

One must draw one’s own conclusions, where some Mainframe customers may perform a structured review of the market place, and even though the technical recommendation might be for a product not covered by an Enterprise Software License Agreement, typically from a smaller ISV, the product chosen is one already paid for, or at least available from the Enterprise Software License Agreement.  This generates several issues, including but not limited to, alienating the smaller ISV community, having used them for expediency, and not delivering the best solution for your business…

So does the self-fulfilling prophecy ensue, where the Mainframe customer questions the cost of Mainframe software, but perhaps implicitly or unknowingly, said Mainframe user has contributed to such an environment, where a limited number Mainframe ISV’s control the Mainframe software market?

Isn’t it somewhat of a paradox that in The UK, the monopolies commission would review the merits of an M&A between two major grocery supermarket or energy supplier companies, and yet whether in The UK or globally, there are several major ISV’s (E.g. ASG, BMC, CA, IBM) dominating the Mainframe software market, primarily via Enterprise Software License Agreements?  Can this really be a good thing for the Mainframe user, limited supplier choice and therefore a lack of healthy competition?

Perhaps it is the responsibility of the Mainframe user to actually choose software impartially, and from time-to-time choose the best product, regardless of ISV.  This might generate a more active market place for software choice, while it was forever thus, the larger ISV is so big that they can easily acquire the smaller ISV who has developed and sold a good product, but at least the Mainframe ISV market place continues to evolve.  In this case, it seems somewhat logical that the Mainframe user is in control of their destiny, but only by safeguarding that their default option is not the Enterprise Software License Agreement.  They encourage an active and impartial ISV software market by dispassionately reviewing the open market and choosing the best Mainframe software product for their business!

Lewis Carroll once said “integrity is doing the right thing, even when no one is watching”!  When was the last time a major ISV declared an open book policy for your business, offering you flexible options to benefit from their Enterprise Software License Agreement, while allowing you to choose a best-of-breed software product, but not from their software portfolio, giving you a discount (credit note) for their software product that didn’t match your business requirement?

COBOL – A Viable Programming Language?

Social Media Sharing

For the last twenty years or so I have encountered many scenarios where Mainframe users consider migration to a Distributed Systems (E.g. Wintel, UNIX/Linux, et al) platform, where more often than not the primary reasons seems to be “green screen” and/or “COBOL is a declining legacy language” based.

Going back to basics, COBOL is a Common Business Oriented Language, although the naysayers might say COBOL is a Completely Obsolete Business Oriented Language; we will perhaps try to be more dispassionate in this discussion…

Industry Analysts have stated that there are ~220 Billion lines of COBOL code and ~100,000 programmers and that COBOL applications process ~80% of business transactions daily, and that there are ~200 times more COBOL transactions processed daily, when compared with Google searches!  A lot of numbers and statistics, but seemingly COBOL is still widely used and accepted.  Even from a new development viewpoint, ~5 Billion lines of COBOL code per annum (~15% of Annual Global Development) is stated, suggesting that COBOL is not in any way obsolete or legacy, so why is COBOL perceived by some in a dubious manner?

Maybe because COBOL was introduced in 1959 and primarily it is deployed on the Mainframe, and so anything that is 50+ years old and has an association with the Mainframe just has to be dubious, doesn’t it?  Of course not, as this arguably “pioneering” or at least one of the first “widely deployed” programming languages allowed many global and significant businesses grow, in tandem with the IBM Mainframe platform, automating and streamlining business processes, increasing productivity and so on.  So depending on your viewpoint, COBOL was either in the right place at the right time, stimulating the Data Processing (DP) and Information Technology (IT) revolution, or COBOL just got lucky, it was “Hobson’s Choice”…

Although there have been several iterations of COBOL standards (I.E. COBOL-68, COBOL-74, COBOL-85), primarily associated with the American National Standards Institute (ANSI) and more latterly COBOL 2002 (ISO), a COBOL program that was written and compiled on an IBM Mainframe several decades ago, will most likely still run on the latest generation IBM Mainframe.  Put another way, its backwards compatibility ability has been significant, and although there were some migration considerations associated with the Language Environment (LE), the original COBOL Application Development investment has generated a readily usable Return On Investment (ROI) over and over again.  How true is this for other programming languages and computing platforms?  For the avoidance of doubt, a COBOL program that was written in 16-bit, can still run today on a 64-bit platform, and with a modicum of evolution, fully exploit the latest functionality and 64-bit performance, with minimal fuss.  While how many revolutionary or significant upgrades have been required for Commercial Off The Shelf (COTS) software and associated bespoke application development code, to upgrade non-Mainframe platforms from 16-32-64-bit?

So, is COBOL a viable programming language of the future?  One must draw one’s own conclusions, but we can look to recent functional enhancements and statements of direction from an IBM Mainframe viewpoint.

In recent years IBM have actually increased the number of COBOL R&D personnel by a factor of ~100%, while increasing allocated investment, commitment and interest accordingly.  This observation more than any other, suggests that at least from an IBM Mainframe viewpoint, COBOL is an important function.

From a technical function viewpoint, the realm of possibility exists with COBOL, interacting with all 21st century programming and function techniques, dismissing the notion that COBOL can only be considered as a traditional/legacy option for CICS-Batch applications and associated “green screen” environments, for example:

  • Support for CICS integrated translator
  • Support for latest SQL data types in syntax via DB2
  • Support for Java interoperability via object-oriented COBOL syntax
  • Support access for WebSphere enterprise beans
  • Support for Java SDK
  • Support for XML high speed parsing and validation (UTF-8, UTF-16 & various EBCDIC codepages)

From a strategic statement of direction viewpoint, IBM have declared the following major notable activities:

  • Performance and resource utilization optimization, reducing TCO accordingly
  • Improved middleware (I.E. CICS, DB2, IMS, WebSphere) programmability and problem determination
  • Improved capabilities (E.g. XML, Java, et al) for modernizing & creating business critical applications
  • Improved programmer (E.g. Usability and Problem Determination) productivity
  • Source and load (I.E. recompile not required) compatibility (E.g. old programs can call new and vice versa)

Even for those occasions where the IBM Mainframe platform might be decommissioned, COBOL can still be processed on alternative platforms via code migration techniques such as Micro Focus, where such functions and services can be Cloud based.  However, once again, isn’t the IBM Mainframe the ultimate “Cloud” platform, which has arguably been the case “forever thus”?

One must draw one’s own conclusions as to why the Mainframe platform and/or COBOL applications are often considered for replacement via migration, when the Mainframe platform is both strategic and cost efficient.  As with any technology decision, there is no “one size fits all” solution, but perhaps a little education can go a long way, and at least the acceptance that seeming “legacy” technologies are strategic and viable.