Mainframe ISV Software: Is Continuous Product Improvement Always Evident?

Ken Venturi once said “I don’t believe you have to be better than everybody else.  I believe you have to be better than you ever thought you could be”.

Wouldn’t it be great if every CTO and/or Product manager had this same philosophy for their Mainframe software solution?  One such example I have experienced over the years is (E)JES from Phoenix Software International (PSI).  Of course it’s really important to have Day 1 support for the latest release of Operating System, z/OS 2.1 being the latest example, but what about actually exploiting the latest functionality available with the latest zSeries Mainframe Enterprise Servers and z/OS Operating Systems?

To drive maximum bang from you’re your buck, optimal performance and robust cost optimization can only be possible by recognizing and exploiting the latest Mainframe function ASAP, as and when appropriate.  Furthermore, listening to your customers, analysing their feedback, actively participating in User Organizations such as SHARE, and so on, will all help in continuous product development and innovation.

Here are some of the reasons why (E)JES has succeeded over a 30+ year period, recognizing and exploiting new z/OS function, as and when the updated z/OS is released for General Availability (GA).  Even today, with Version 5.3 supporting z/OS 2.1 as of day 1, (E)JES continues to offer value-added function for the seasoned, inexperienced and in fact, all IBM Mainframe technicians:

  • 64-bit performance optimizations (I.E. MEMLIMIT: above-the-bar) for both (E)JES client and server components, safeguarding minimal z/OS resource usage.
  • Nearly all (E)JES JES subsystem processing routines are eligible for zIIP redirection, delivering software cost savings for all (E)JES users.  Sub-Capacity System z processor users experience improved (E)JES performance because zIIP engines always run at full speed.  This behaviour differs from that of General Purpose CPs, “throttled” with Sub-Capacity deployments.
  • (E)JES code executes faster via its inbuilt High Performance Routine (HPR) facility, specifically developed to make (E)JES code execute faster while accessing data in JES control blocks.  HPRs have a shorter instruction path length than previous coding techniques, avoiding delays in modern z Series CPU instruction pipelines.
  • If High Performance FICON (zHPF) is available, (E)JES uses Transport Mode channel programs for JES Spool I/O.  When zHPF is not available, or when a CAS server performs I/O against the global data set, (E)JES uses the highest-performing Command Mode channel programs currently available.  These channel programs perform I/O significantly faster than “ordinary” channel programs.
  • The use of 24-bit (captured) UCBs puts a strain on the 24-bit virtual storage resource.  The use of ordinary (non-extended) TIOT entries puts a limit on the total number of allocations that can exist simultaneously in an address space.  (E)JES supports and uses 31-bit (uncaptured) UCBs and the extended TIOT (XTIOT) function (I.E. NON_VSAM_XTIOT=YES in DEVSUPxx PARMLIB)
  • (E)JES supports placement of JES spool data sets in the cylinder-managed area of an Extended Address Volume (EAV).  Of course, as of z/OS 1.12, EAV increases 3390 DASD capacity to ~1 TB.
  • (E)JES Pattern Utility Matching uses the SRST hardware instruction.  Empirical measurements show this technique is far faster on modern System z processors than alternatives such as the TRT instruction or “brute force” matching techniques using CLI/CLC.

One of the primary benefits of upgrading IBM z/OS software is the overall system performance benefit and associated cost reduction, but of course, IBM can only deliver the function and ability, while it’s incumbent upon the ISV community to upgrade their software products accordingly.  A key goal for any good ISV software product is to try to provide a value-add in the area of performance.  This has been one of the primary areas of focus for (E)JES since its introduction in 1978. 

Most spool display and management products tend to rely on the most resource-intensive interface available, namely the JES subsystem provided SSI 80.  (E)JES benchmarking tests against the most readily-available JES SSI 80 exploiters demonstrates significant CPU savings when deploying (E)JES.

Software products also need to deliver continuous improvements with regard to usability, presentation and in-built function, increasing user and system administrator productivity.  Without doubt, optimization encompasses not just hardware, but software, services, systems management disciplines and “best practices” that tie it all together.  Here are some of the usability enhancements that (E)JES has incorporated:

  • ISPF users running a 3270 emulator on a programmable workstation can now search IBM Eclipse-based InfoCenters via (E)JES.  Although (E)JES fully supports BookManager format documentation, BookManager READ/MVS is now obsolete, beginning with z/OS 2.1, BookManager softcopy books are no longer delivered by IBM.  IBM has stated that InfoCenters, and eventually KnowledgeCenters, are their strategic direction for online documentation.
  • (E)JES Web is a new, browser-based interface to (E)JES.  The associated RESTful API delivering this web enabled technology provides a framework for the creation of Eclipse plug-ins, mobile applications, and other web services clients.  This facility will provide a “rapid learning” type facility for users (E)JES users, both new and old that might be uncomfortable navigating traditional 3270 interfaces.
  • (E)JES provides a Java Application Programming Interface (API), complementing other in-built APIs for REXX and procedural languages.  By using an (E)JES API, a user can harness the versatility of their preferred programming language to interface and interact with (E)JES.  This support provides an interface to deliver nearly all of the capabilities available to an interactive (E)JES user.
  • (E)JES incorporates context sensitive help function, with point-and-shoot/pop-up dialogs, helping educate users on (E)JES, JES and z/OS while they work.  Users can get pop-up explanations of columns, input choices for unprotected fields, and a list of line commands.  Smart pop-ups explain the contents of certain columns, such as system abend codes.

The latest (E)JES Release Information Manual eloquently details the product enhancements over the last 5 releases or so, providing a good Product Roadmap reference point.

So, whether the ISV software product you deploy has been available for several years or several decades, do you safeguard maximum business benefit for optimal cost by considering:

  • Does the ISV deploy the latest zSeries server (I.E. zBC12, zEC12) for software interoperability and full hardware function exploitation; or an emulation (I.E. zPDT) technique?
  • Does the ISV deliver value-added z/OS related function on Day 1 or even within a year of the latest z/OS release?
  • Does the ISV deliver meaningful function to assist your users deploy said function, while simplifying environment management for system administrators?
  • Does your ISV product optimize cost, with Sub-Capacity pricing in MSU increments, aggregated MSU costs for your entire zSeries Mainframe environment, as opposed to specific workloads (E.g. CPC’s, LPAR’s, et al)?
  • Does your ISV product optimize cost by offloading the majority of its CPU function to zIIP specialty engines, which run at maximum speed, and where software “runs for free”?

Of course, only you can ask and potentially answer these questions during your day-to-day activities of maintaining currency and optimal performance for your Mainframe software portfolio.

Sometimes the hardest questions anybody can ask are the questions they ask themselves, which are never rhetorical questions!  Extracted verbatim from the latest (E)JES Release Information Manual:

Team (E)JES took advantage of the Phoenix Software International zHISR performance analysis product to discover performance “hot spots” in  the (E)JES product.  Sometimes the simplest, least conspicuous piece of code turns out to be a major CPU contributor.  See below for some of the most embarrassing “surprise” hot spots we discovered using zHISR in a z/OS 2.1 LPAR:

  • Over 30% of the CPU used during a Spool Data Browse FIND operation, against a multi-million-line SYSOUT in JES2, turned out to be code that was clearing a record buffer to blanks using MVCL.  This clearing code was eliminated and some minor adjustments were made in other code to compensate for this change.
  • 27% of the CPU used to produce the Activity display in JES2 turned out to be in a routine that manages an internal resource called the “Job Positions Table.”  The algorithm was improved (to work more like its JES3 counterpart) and that routine is no longer a significant CPU contributor.
  • 9% of (E)JES session start-up was a 26-year-old “brute force” prime number generator used to compute the size of a hash table.  That code was totally reworked and now accounts for approximately .02% of session start-up CPU.
  • A 6% performance penalty was observed when sorting a tabular display with a moderate number of rows. The hot spot turned out to be the code that cleared the work area for the sort service to zeros (another MVCL). This overhead was reduced to .04%.

Mea culpa and humility, never a bad thing, but you have to be honest with yourself and ask yourself the right questions!  So going back full circle and quoting Ken Venturi once again, “I don’t believe you have to be better than everybody else.  I believe you have to be better than you ever thought you could be”.  You must draw your own conclusions as to whether such an observation applies to the (E)JES team at Phoenix Software International (PSI)…

Why not ask them yourself?  Ed Jaffe, the (E)JES CTO will be available at the forthcoming UK GSE Annual Conference, 5-6 November 2013, speaking about (E)JES System Management Software: More With Less For Less, For The z/OS Mainframe and z/OS 2.1 User Experiences.

FICON (Fibre Connection Channel): 15 Years of Mainframe I/O Improvements

In 1998, IBM introduced FICON channels for enhanced I/O connectivity and performance for their 9672 G5 processors, delivering significant capability when compared to its predecessor, ESCON.  Let’s not forget that ESCON (Enterprise Systems – S/390) was the first iteration of Fibre Channel for the IBM Mainframe, delivering significant capability, when compared with the previous technology of heavy, large and costly copper based bus & tag parallel (S/370) channels.

ESCON channels were first introduced in the early 1990’s, but after less than a decade, the data and associated storage device explosion was exposing the technical capabilities of ESCON, for example:

  • Mainframe Server Channel Support: One IBM Mainframe processor could only support 256 ESCON channels, whereas FICON was offering a ~5-8:1 reduction in channel requirements.  Put another way, a customer could expect to consolidate the number of channels required from ~200 ESCON to ~30-40 FICON.
  • Device Support: One ESCON channel could support up to 1024 devices (sub-channel/device numbers), channel, whereas a 9672 FICON channel increased support by 16 fold, up to 16,384 (16 K) devices.
  • Distance: The performance of ESCON dropped off significantly when the distance between the channel and associated Control Unit was greater than ~9 KM.  FICON increased this distance separation to ~100 KM, paving the way for the Geographically Dispersed Parallel Sysplex (GDPS) topologies we take for granted today.
  • Performance: ESCON performance was limited to 17 MB/S, whereas the first evolution of FICON channels delivered 100 MB/S full-duplex performance.

Clearly the first iteration of FICON technology delivered significant benefit to the IBM Mainframe User, and arguably is the primary Mainframe evolution that has sustained data growth and the adoption of Disaster Recovery and Business Continuity resiliency.  So, what does FICON offer today, some 15 years later?

Just as FICON superseded ESCON, FICON Express has now superseded FICON, offering a technology base that can continue to deliver benefit for many years to come.  FICON Express continues the tradition of offering more capabilities with each new generation of FICON channel.  The features were designed with the future in mind, while remembering the past, by supporting the data serving leadership of System z and enabling improved data access using High Performance functions (I.E. zHPF), while providing backwards compatibility, being able to auto-negotiate the link data rates of 2, 4 or 8 Gbps, namely the various FICON Expressn iterations (2/4/8).

High Performance FICON for System z (zHPF) is a data transfer protocol that is optionally deployed for accessing data from IBM Mainframe compatible storage subsystems (E.g. IBM DS8000, EMC Symmetrix V-Max, HDS USP, et al) and other subsystems.  Initially the data types supported were DB2, PDSE, VSAM, zFS and Extended Format SAM, and more latterly, legacy access methods including QSAM, BPAM and BSAM are now supported.  zHPF leverages the potential of FICON channels to deliver significant performance enhancements, and can help reduce the infrastructure costs for System z I/O by efficiently utilizing I/O resources, minimizing CHPID (Channels), Fiber (Cables), Switch Ports (E.g. Cisco, Brocade) and Control Unit (E.g. Disk Subsystem) resource requirements.  zHPF also compliments the Extended Address Volumes (EAV) strategy for growth, increasing I/O rate capability as the associated disk volume size increases.

The latest generation FICON Express8S channel has two possible modes of operation designed for connectivity to servers, switches/directors, disks, tapes and printers:

  1. CHPID Type FC: FICON, zHPF, and channel-to-channel (CTC) traffic for the z/OS, z/VM, z/VSE, z/TPF, and Linux on System z environments
  2. CHPID Type FCP: Fibre Channel Protocol (FCP) for attachment to SCSI devices for the z/VM, z/VSE, and Linux on System z environments

With FCP channel full fabric support, multiple switches/directors can be placed between the System z server and SCSI device, allowing many “hops” through a storage area network (SAN) and providing improved utilization of intersite-connected resources and infrastructure.  This may help to provide more choices for storage solutions or the ability to use existing storage devices and can help facilitate the consolidation of Distributed Systems servers (E.g. UNIX, Wintel) onto System z servers, protecting investments in SCSI-based storage.

I/O performance improvement rates for the initial iterations of FICON when compared to ESCON and then FICON Express when compared to FICON, and more latterly zHPF have been significant.  Using like-for-like benchmark performance studies, we can see significant performance improvements:

I/O Driver @ 4K Block Size – ~ I/Os Per Second

Channel Type

#I/Os per Sec

n:1 Increase

ESCON

1200

N/A

FICON Express
2/4 Native

14000

11.7

FICON Express
2/4 zHPF

31000

2.3

FICON Express
8 Native

20000

1.5

FICON Express
8 zHPF

52000

2.6

FICON Express
8S Native

23000

1.2

FICON Express
8S zHPF

92000

1.8

NB. Maximum performance is server related (E.g. z10, z114, z196, zEC2).

Compared to ESCON, the latest 8 Gbps FICON channel leveraging from zHPF function delivers ~76 times more I/O throughput compared to ESCON, while significantly increasing throughput, by at least 50% from generation to generation.

I/O Driver Mixed Read/Write – ~ MBs Per Second

Channel Type

#MBs per Sec

n:1 Increase

ESCON

12

N/A

FICON Express
4 Native

350

29.2

FICON Express
4 zHPF

620

1.8

FICON Express
8 Native

620

1.8

FICON Express
8 zHPF

770

1.3

FICON Express
8S Native

620

1.0

FICON Express
8S zHPF

1600

2.1

NB. Maximum performance is server related (E.g. z10, z114, z196, zEC2).

Compared to ESCON, the latest 8 Gbps FICON channel leveraging from zHPF function delivers ~133 times more I/O throughput compared to ESCON, while significantly increasing throughput, by at least 100% from generation to generation.

Once again, the backwards compatibility capability of the IBM Mainframe server is highlighted by the evolution of the FICON channel, and in particular, Disk Subsystems IHV’s, obviously IBM themselves, but notably EMC, HDS and Oracle (StorageTek) in evolving their offering to support the latest FICON technologies.

We sometimes might take for granted how much data can be stored by a single footprint IBM Mainframe and how much performance and throughput capability is available to process this data.  However, we shouldn’t under estimate what role FICON has played in allowing this significant data (I/O) processing capability to grow, often rapidly, sometimes exponentially.

If there is a downside, such performance attributes might have eradicated the skills required to tune I/O subsystems, but that’s perhaps a subject matter for another day…

Extended Address Volumes (EAV): Pros & Cons

It wasn’t too long ago that the maximum size of a 3390 DASD volume was ~54 GB (65,520 Cylinders) via the 3390-54.  Then with the release of z/OS 1.10, Extended Address Volumes (EAV) were introduced, and a ~400% increase in single device capacity was delivered @ 223 GB (262,668 Cylinders)!  Surely enough storage capacity for anybody?

Of course, we all know that 21st Century data requirements are significant, and so the release of z/OS 1.13 (z/OS 1.12 and PTFs) has delivered another ~400% increase, with a single device capacity of 1 TB (~1.182 Million Cylinders).  However, let’s not forget, data storage capacity can increase by ~20%+ per annum, I guess it won’t be too long before we see another 400%+ increase in size, ~4 TB+…

EAV implementation relieves disk capacity constraints and allows storage growth without adding more devices.  In today’s world of TCO optimization and a utopia of very short-term ROI, EAV usage will reduce TCO, primarily personnel and environmental (E.g. Power, Cooling, Floor Space) related.  Potentially the ability to manage more data with fewer DASD volumes simplifies the Storage Administration process, therefore increasing the number of TB managed by each technician.  Typically, additional capacity (EAV) can be added dynamically, increasing DASD volume capacity online via the Dynamic Volume Expansion (DVE) function.

Theoretically (as per current architectural constraints) a 3390 EAV can grow to 225 TB; the realm of possibility exists!

The pros of EAV implementation seem obvious, a significant capacity increase in a single footprint, easy implementation, with demonstrable TCO benefits; but is all that glisters always gold?

Learning from history is always a good thing and if we consider the challenges of adopting the 3390-9/27/54 device, did we encounter any capacity optimization issues?  As a single device increases in size, device occupancy might become a challenge.  For example, 90% occupancy of a 3390-54 @ 54 GB is ~48.5 GB, or put another way, ~5.4 GB is allocated but never used.  So if we apply the same metric to a 1 TB device, you guessed it, ~100 GB is allocated and never used…

So what they say.  Indeed the separation of the physical and logical device eliminates any physical space utilization considerations, but what about the number of data sets and more importantly extents on that EAV or even 3390-54 DASD volume?  An issue that has plagued many Mainframe installations is disk fragmentation, as no matter how big a DASD volume, sometimes successful data set allocation is dependent upon sufficient contiguous extents to satisfy primary allocation or secondary extension.

At first glance, the process of defragmentation is very simple, DFSMSdss DEFRAG, FDR/CPK COMPAKTOR, et al, but typically these processes require minimal data set allocation activity and are batch orientated. DASD enqueue time is a consideration, as these traditional Mainframe defrag solutions can generate significant enqueue activity for the VTOC and data sets alike. Can the 21st Century business that requires near 24*7 data availability allocate sufficient time (E.g. minimal processing window) to perform such manual defragmentation activities? If only defragmentation could be transparent, automated and dynamic…

RealTime Defrag (RTD) is such an option that deploys a multi-faceted approach to delivering “on-line defrag”:

  1. Release – Release allocated but unused space for all data set types
  2. Combine – Combine extents, reducing the number of allocated extents for optimized performance and SE37 abend eradication
  3. Defrag – Reorganize data sets into contiguous groups, increasing size of free extents, optimizing performance and SB37 abend eradication

In conclusion, EAV deployment can only be a good thing, delivering demonstrable TCO benefits in the form of dramatic single-footprint (I.E. Disk Subsystem) capacity increases.  RealTime Defrag can also increase service availability, eradicating the requirement for manual and batch orientated defrag activities, while safeguarding that installed disk capacity is optimized, EAV or not.