zHyperLink: Just Another System z DASD I/O Function Enhancement?

Posted on February 1, 2017 by zman

Over the last several decades or so the IBM Mainframe platform has delivered several new technologies that have dramatically improved the performance of disk (DASD) I/O performance. Specifically the deployment of ESCON as the introduction to Fibre Optical channels, followed by EMIF for channel sharing and reduced I/O protocol, superseded by FICON and most recently zHPF. All of these technologies have allowed for ever larger amounts of data to be processed by the System z server and the adoption of Geographically Dispersed Parallel Sysplex (GDPS) implementations for business continuity reasons. Ultimately mission critical data and decisions are facilitated by applications and sub-second response times for these transactions is expected. Some might say that we’re always running to stand still from a performance perspective when implementing the latest System z technologies?

In reality, today’s 21^st Century mission-critical application is not just capturing and storing customer data, it’s doing so much more, attempting to make informed business decisions for a richer customer experience! Historically a customer transaction would be on a one-to-one basis (E.g. ask for a balance query), whereas today, said transaction might generate more data for the customer, potentially offering them a new or enhanced product. In theory, this informed and intelligent transaction processing delivers a richer experience for the customer and potentially new revenue opportunities for the business.

For several years IBM have integrated the Cloud, Analytics, Mobile, Social & Security (CAMSS) initiative into their product offerings, recognising that a business transaction can originate from the cloud or a mobile device, potentially via a Social Media platform, require rich processing via real-time analytics, while requiring the highest levels of security. Of course, one must draw one’s own conclusions, but maintaining sub-second or ultra-fast transaction response times, with this level of CAMSS complexity requires significant performance enhancements. To deliver such ultra-fast response times requires the DASD I/O subsystem to maintain the highest levels of performance, aligned with the latest System z server platform…

In January 2017 IBM issued a Statement of Direction (SoD) and associated FAQ for their zHyperLink technology. zHyperLink is a new short distance mainframe link technology designed for up to 10 times lower latency than zHPF. zHyperLink is intended to accelerate DB2 for z/OS transaction processing and improve active log throughput. IBM intends to deliver field upgradable support for zHyperLink on the existing IBM DS8880 storage subsystem. zHyperLink technology is a new mainframe attach link. It is the result of collaboration between DB2 for z/OS, the z/OS operating System, IBM System z servers and the DS8880 storage subsystem to deliver extreme low latency I/O access for DB2 for z/OS applications. zHyperLink technology is intended to complement FICON technology, accelerating those I/O requests that are typically used for transaction processing. These links are point-to-point connections between the System z CEC and the storage system and are limited to 150 meter distances. These links do not impact the z Architecture 8 channel path limit.

From a DB2 I/O service performance perspective viewpoint, at short distances, a native FICON or zHPF originated I/O typically requires 300 Microseconds (μs) for a simple I/O operation. The coupling facility for z Systems typically can read or write 4K of data in in under 8 Microseconds. zHyperLink technology will provide a new short distance link from the mainframe to storage to read and write data up to 10 times faster than FICON or zHPF; reducing DB2 I/O service times to an anticipated 20-30 Microseconds.

In conclusion, with a promise of 10 times faster processing, as per its fibre optic channel technology predecessors, particularly EMIF and zHPF, zHyperLink is a revolutionary DASD I/O function and not just another DASD I/O subsystem function enhancement. At this stage, the deployment of zHyperLink functionality is restricted to DB2 and the IBM DS8880 storage subsystem, while we eagerly await compatibility support from EMC and HDS accordingly. Moreover, as per the evolution of zHPF, we hope for the inclusion of other I/O workloads to benefit from this paradigm changing I/O response time technology.

Finally, as always, the realm of possibility always exists for each and every System z DASD I/O subsystem to be monitored and tuned on a proactive and 24*7*365 basis. Although all of this DASD I/O performance data has always been and still is captured by RMF (CMF) data, intelligent processing of this data requires an ever evolving Performance Management process and arguably an intelligent software solution (E.g. IntelliMagic Direction Disk Magic or Technical Storage Easy Analyze Disk Mainframe) to provide meaningful information and business decisions from ever increasing amounts of RMF (CMF) data. In November 2016 ago I delivered the DASD I/O Performance Management Is Easy? session at the UK GSE Annual 2016 meeting accordingly…

All Flash & Substance – Is The System z Microsecond The New Millisecond?

Posted on July 5, 2016 by zman

Is 2016 the year of the All Flash disk array? Seemingly from a System z perspective, 2016 has seen improvement in the All Flash disk array offerings from the major disk suppliers, namely EMC, HDS and IBM. From a usability perspective, managing latency might be the overall challenge, where these ultra-fast SSD systems are delivering I/O performance response times measured in the ~250-500 Microsecond (μs) range, potentially consigning the traditional Millisecond (ms) measurement to history!

Whatever the speeds and feeds might be, as of 2016, the benchmark for a System z All Flash Disk Array is seemingly measured @ <500 Microsecond μs response time, supporting ~n PB of capacity and delivering ~nnn GB/S throughput for mixed read/write workloads. Of course, strong encryption, typically full disk Data @ Rest Encryption (D@RE) based and full seamless data replication interoperability are also mandatory.

Historically we evolved from Data Processing to Information Technology, not only automating the capture and processing of data, but gradually evolving our processes, using this data for business advantage. In recent years, the information explosion has dictated that each and every business must be a cognitive business, using intelligent analytics to gain insight and faster decision-making from the business data collected.

Currently the Internet of Things (IoT) supplements the medium-term Cloud, Analytics, Mobile, Social & Security (CAMSS) initiative, being the processes and associated solutions required by cognitive businesses to make timely and informed decisions, capturing deeper customer insight, ultimately delivering competitive advantage. Therefore the 21^st century business generates a significant requirement for storage capacity and performance to fully realize the benefits of this truly business aligned cognitive approach.

The largest global organizations from all verticals leverage from the power and true 24*7*365 availability and reliability of the System z Mainframe to power enormous relational databases, processing millions of customer transactions on an hourly basis. These always-on, mission critical business environments demand the performance, reliability, TCO and System z platform integration delivered by the associated DASD (3390) subsystem.

Each and every System z user will have their IHV of choice for delivering disk storage, in alphabetical order, EMC (E.g. VMAX AFA/All Flash Array), HDS (HAF/Hitachi Accelerated Flash) or IBM (E.g. DS8888). The choice of disk storage was forever thus, reviewing the market place and choosing the best option for your business. What might require reflection is how the DASD I/O subsystem is managed and the associated interaction with said IHV supplier. Systems Management solutions such as Easy Disk Analyze Mainframe (EADM) and IntelliMagic Vision (Disk Magic) will certainly simplify the analysis and presentation of DASD subsystem performance data.

However the emphasis of the actual System z DASD subsystem for an All Flash array might move from being an internal consideration, to a direct and timely communication with the IHV supplier. Put very simply, in an environment where Mission Critical systems rely upon ultra fast processing of massive amounts of data, any flash memory issues, whether capacity or defect related will need IHV interaction ASAP, arguably “Before The Event”. As with the System z Server itself, where we’re used to On/Off Capacity on Demand (OOCoD) processes, maybe we need to consider a similar approach with our All Flash System z DASD arrays. For the avoidance of doubt, as opposed to waiting for an issue to impact our business, maybe we need to work smarter with our IHV, to safeguard that sufficient flash memory is available, to proactively resolve capacity or defect issues…

Aligning this with our traditional 3390 DASD I/O subsystem analysis, which might have been on a daily basis, from the rich RMF/CMF data resource, we must fully automate this process to minimize or eliminate the Mean Time To Resolution (MTTR)! The ultimate benefit will be the delivery of meaningful messages that incorporate our 3^rd party IHV supplier, who potentially with Remote Support Facility (RSF) type processes, deploys the “Golden Screwdriver” to seamlessly safeguard the performance profiles of our Mission Critical business applications, leveraging from the latest All Flash disk array.

In conclusion, as always, technology can deliver business benefits, with substance, and this includes All Flash disk arrays. As always, what might need to evolve are the associated Systems Management processes. Therefore asking yet another potential rhetorical question, what is more important, the System z Server or timely data access? The diplomatic answer is that they’re equally important and if so, let’s safeguard the availability of All Flash memory for our DASD subsystems, with the requisite levels of meaningful proactive reporting and IHV supplier interaction.

System z: Optimizing DASD I/O Subsystem Performance

Posted on October 3, 2015 by zman

Historically there was a very simple synergy between the IBM S/370 Mainframe and its supporting disk I/O (DASD) subsystem, allowing for Mainframe host to physical and logical disk device (I.E. 3390) connectivity. The analysis and tuning of this I/O subsystem has always been and continues to be supported by the SMF Type 7n records via IBM RMF and the BMC CMF alternative. However, over the years, major advances in DASD subsystems and the System z Mainframe server have delivered many layers of technology resources (E.g. Cache, Memory, FICON Channels, RAID Storage, Proprietary Microcode, et al) and this has introduced complexities into highlighting DASD I/O subsystem performance problems.

The focus of technology based metrics (E.g. I/O Rate Response Time, I/O MB/S Bandwidth, et al) have also been complemented with more meaningful business focussed Service Level Agreements (SLA). Therefore today’s System z I/O Performance Analyst must gather and act upon proactive meaningful information from the ever-increasing amounts of performance data available. Put another way, too much data can deliver not enough information! As previously stated, it was forever thus, RMF and CMF have always collected the requisite performance data available and arguably no other data source is required (E.g. OMEGAMON/TMON/SYSVIEW Performance Monitor, SAS/MXG/MICS/WPS Performance Database). RMF/CMF is the ideal data source for thorough and timely System z I/O performance management, where intelligent analytics and expert knowledge are required to present this “Golden Record”.

However, today’s System z Support Teams need simple and timely presentation of the data, highlighting potential challenges, graphically presented for their Management, allowing for simple tracking of SLA agreements and technology changes (I.E. Software/Hardware Upgrades).

Additionally, Workload Manager (WLM) can control non-paging queued DASD I/O requests, based upon device busy conditional processing. Therefore the z/OS system can manage I/O priorities in a Sysplex, based on WLM service class goals. WLM dynamically adjusts the I/O priority based on service class goal performance and whether a DASD device can influence the overall performance objectives. For obvious reasons, this WLM function does not micro-manage I/O priorities, only changing a service class period’s I/O priority infrequently. WLM is deployed by many System z users to assist in the automated management of system resources (E.g. CPU, Memory, I/O, et al), based upon Service Level goals.

From a DASD subsystem technology viewpoint, there is no longer an obvious one-one direct connection between the Mainframe host and DASD device. An increasing number of technological advances, both microcode and hardware (E.g. Memory, Fibre Channel, Function Assist Processing, et al) have diminished the requirement for data access directly from the physical device. Put another way, in today’s world of System z servers with multiple cache level CPU chips (I.E. Relative Nest Intensity), massive and multiple processor memory resources (I.E. z13 @ 10 TB Memory), high bandwidth Fibre Channel (I.E. FICON, zHPF) subsystem and a hierarchy of DASD memory (I.E. SSD/Flash, Cache), it’s not uncommon to consider an I/O that requires physical device access as a problem! Finally and most importantly, from a DASD subsystem viewpoint, each of the recognized System z DASD providers, EMC (Symmetrix VMAX), HDS (VSP G1000) and IBM (DS8870) have highly proprietary DASD subsystems that provide z/OS plug compatibility, but deliver overall I/O performance using their own unique architecture and internal algorithms.

Of course, an over configured hardware environment will deliver a poor TCO, while an under configured environment will manifest in SLA issues and bad user experiences, where the middle-ground always delivers the optimal environment. Resource optimization always demands proactive day-to-day management, from an internal and indeed external communication viewpoint. With the highly proprietary design features of the IHV DASD subsystems, whether EMC, HDS or IBM, having the right information and identifying the precise problem, simplifies the communication process with the IHV. Such communication might highlight a resource under provision (E.g. Memory Capacity), a subsystem setting tweak requirement, either host or subsystem based, or indeed a hardware failure. In today’s world, these issues need to be fixed in minutes or hours, not days or weeks.

Therefore, where does today’s System z I/O Performance Analyst start to collect the required information to safeguard that their DASD subsystem is optimized, both from a capacity and performance viewpoint?

A simplistic viewpoint of an I/O health-check should consider the following:

Service Level Agreements (SLA): Are overall objectives being delivered or missed?
User Experience: Are users (customers) complaining of poor service or response times?
I/O Metric Performance: Are there obvious signs of abnormal performance statistics?

Several decades ago, an overall I/O health check might have been a periodic (E.g. Weekly or longer) activity, whereas today it’s undoubtedly a Business As Usual (BAU) and 24*7 activity. Therefore a fully automated solution is required, built upon the tried and tested System z performance fundamentals, namely RMF or CMF. The ideal solution will perform analytics based data reduction, presenting the right information, at the right time, allowing for intelligent business based communication, both internally, to customers and end users from an SLA viewpoint, and externally, with IHV DASD suppliers, safeguarding optimal performance and TCO.

EADM (Easy Analyze DASD Mainframe) is a solution from Technical Storage that performs automated performance analysis of the z/OS I/O subsystem, delivering predictive analytics for better storage capacity planning and performance measurement. The Technical Storage EADM architects have in excess of 40 years IBM Mainframe experience, specializing in the I/O subsystem, and so it’s no surprise that EADM delivers expert and timely knowledge via an easy-to-use solution.

EADM is an easy-to-install and easy-to-use plug-and-play solution that has no proprietary considerations, requiring no additional System z resource (E.g. CPU, Memory, DASD, et al) requirements. Installed on Microsoft server platforms, EADM is easily virtualized via VMware, Hyper-V, et al, requiring no target database for performance data storage. EADM performs a daily health check of the entire System z disk subsystem. EADM works around the clock, delivering customized and automatic user friendly GUI type reports. For today’s System z technician, the open and IP architecture base of EADM allows for secure remote access via Mobile, Tablet or Laptop devices, as and when required.

Operations and performance teams are alerted as soon as performance variances occur, typically in minutes, assisting in the identification of underlying root problems, causing changes in system behaviour. Incorporating intelligent and meaningful I/O performance indicators, with drill-down and zoom-in ability, storage technicians can determine if the problem is temporary, permanent, local or global. By simplifying the data reduction process (E.g. RMF/CMF data from numerous LPAR/Sysplex environments), EADM safeguards that the internal technical team can efficiently manage their ever increasingly complex and large DASD environment, for intelligent and timely communications with internal business teams and external suppliers alike.

EADM simplifies the System z I/O subsystem capacity and performance management process, delivering expert reports and timely historical analysis, for example:

Automatic daily (24 Hour) analysis of Sysplex wide workload (On-Line TP & Batch) I/O response times
Systematic intelligent alerts of early performance variances with exact occurrence time indicators
Identification of I/O performance hot-spots with DASD volume and data set level granularity
Performance trending at DFSMS Storage Group, Subsystem LCU and DASD volume level
DR (E.g. PPRC) simulations to prevent data loss and forecast Data Centre failover scenarios
I/O subsystem WLM indicators to determine exactly what impacts performance objectives
Full FICON channels and zHPF analysis, incorporating typical I/O throughput indicators
HyperPAV and associated LCU indicators to easily balance volumes, optimizing PAV alias allocation
Performance monitoring and balancing via intelligent LCU, SSID and I/O analytics
DASD capacity usage via DCOLLECT data, comparing assigned vs. allocated vs. actual disk utilization
EADM supports entry-level several LPAR and complex multiple CPC/LPAR System z configurations

A well provisioned and performing System z I/O subsystem is of vital importance for safeguarding today’s ever increasing storage requirements of mission critical business applications. A poorly performing I/O subsystem will generate unnecessary and extra CPU overhead, with potential and tangible TCO impact, in conjunction with potential business impact. Although the advances of the System z server and underlying DASD I/O subsystem can compensate for many application code or data placement issues, the fundamental concepts of analysing and tuning the I/O subsystem remain.

Therefore the savvy and proactive System z customer will safeguard that they find a solution to deliver optimal DASD I/O performance. Without doubt, such an analysis could be performed by a highly-skilled individual, but today’s 21^st Century world demands a hybrid of technical and commercial skills. Therefore a solution that incorporates the diagnostic knowledge of the most highly trained technician, performs intelligent analytics on a plethora of Sysplex wide performance data sources and presents the information required, is one that will deliver benefit each and every day. EADM is an example of such a solution, delivering demonstrable System z TCO optimization benefits, while safeguarding a short-term ROI, with simple deployment and resource utilization attributes.

Revisiting The zSeries Mainframe Storage Hierarchy

Posted on December 2, 2014 by zman

Recommendation: The next time you perform a zSeries Mainframe server upgrade, consider adding Flash Express cards, for an extra 1.4-5.6 TB of memory speed storage. Similarly, the next time you perform a zSeries Mainframe DASD subsystem upgrade, consider adding as much SSD (flash memory) capability that you can afford and justify. Both upgrades will deliver significant performance and business benefits, arguably for minimal cost, when considered as a several year TCO investment.

Conceptually the zSeries Mainframe storage hierarchy has comprised the same layers for many decades, while performance and capacity attributes have dramatically increased over time. Although System/390 introduced the concept of Expanded Storage (I.E. Hiperspace, Data Space) in 1990 and there have been various implementations of SSD (E.g. StorageTek 4080), the ability to transparently implement significant capacity memory layers has only recently become possible.

Let’s not forget, the closer data is to that most precious and expensive of resources, namely CPU, the faster it will process. When revisiting the traditional storage hierarchy, we can now consider two new layers, namely Flash Express and Solid State Drive (SSD):

zSeries Storage Hierarchy

I have previously written about the Flash Express layer. Flash Express is a new memory layer within the zSeries Mainframe storage hierarchy, which can be considered as either a Solid State Drive (SSD) or Storage Class Memory (SCM) technology. Flash Express is integrated on PCI Express attached RAID 10 Cards, packaged as a two card pair, each with a 1.4 TB capacity per mirrored card pair. A maximum of 4 card pairs can be configured, delivering up to 5.6 TB of memory capacity, assigned to LPAR resources, just like main memory.

The simplest function to benefit from Flash Express memory would be SVC dump processing, substantially reducing dump capture time.

Flash Express can also be deployed to replace z/OS disk paging, substantially reducing the response time associated (I.E. ~5-20 μs vs. ~10 ms). The benefit for z/OS paging is not the replacement of memory paging, but replacing disk paging with Flash Express storage. Flash Express is suitable for workloads that can tolerate paging, but will not benefit workloads that cannot tolerate paging activity. The fundamental z/OS design for Flash Express memory will not completely remove any virtual storage constraints created by a paging spike, although a modicum of scalability relief is expected due to the faster I/O associated with Flash Express memory.

In conjunction with Flash Express, there were advancements in the Real Storage Management (RSM) function, including pageable 1MB Large Page Support. Large Pages (1MB) deliver benefit, with increased performance, decreasing the number of Translation Lookaside Buffer (TLB) misses that an application incurs, reducing time when converting virtual addresses into physical addresses and reduced real storage usage to maintain DAT structures. The use of Large Pages typically deliver Internal Throughput Rate (ITR) performance benefits of ~1% for IMS, ~3% for DB2 and ~5% for Java workloads.

Although SSD (flash) storage might have been selectively deployed in the zSeries Mainframe Data Centre for the last 5 years or so, the ever increasing requirement for increased Quality of Service (QoS) in terms of data availability and ultra-fast transaction response times dictate the increased usage of SSD architectures. Entire DASD subsystems can be built upon SSD technologies, or more likely, hybrid subsystems, containing both SSD and traditional HDD technologies. This storage subsystem evolution allows organizations to gain significant competitive advantages, delivering new services for existing and more importantly, new customers alike.

Using SSD disk subsystems, overcomes the limitations of traditional spinning hard disk drives. However, not every enterprise application needs this ultra-high performance; since flash storage still costs more than spinning drives for the same capacity, organizations must be mindful of expenditure and now much flash memory (SSD) they deploy; as always, flexibility is key.

Complete or hybrid SSD I/O subsystems deliver performance and economic advantages for your mission critical business environment:

Green Data Centre: ~25-60% energy reduction (flash memory vs. spinning disk)
Data Centre Space: ~20-40% smaller footprint (memory cards vs. Hard Disk Drives)
Optimal Performance: Consistent ~1-3 ms access (Hard Disk Drives @ ~10 ms)

The utopia is for a self-tuning disk subsystem, automatically redirecting I/O between SSD and HDD, based on file performance and overridden, as and when required, by storage policies. Whether EMC, HDS (HP OEM) or IBM, this self-tuning ability is evolving, while each disk vendor has their own implementation. However, whatever your choice of disk subsystem, the ability to incorporate SSD into your storage hierarchy, either full or partial is evident.

In conclusion, ~25 years ago, the zSeries Mainframe user benefitted from faster performance via System/390 Expanded Storage and disk subsystems with cache and DASD Fast Write memory buffers. The cost of such memory storage was a major consideration then, but with good I/O tuning disciplines, the savvy zSeries Mainframe user benefitted from these technology advancements. Flash Express and SSD deliver the potential to deliver increased performance, for a relatively low cost, and now is the time to embrace these technologies. Ignore the storage hierarchy at your peril and as I previously documented, optimal I/O performance always delivers significant benefit.

Tried & Tested I/O Tuning Techniques For The zSeries Mainframe

Posted on October 1, 2014 by zman

In the last few years I have heard a number of Mainframe decision maker’s state something like, “technology advances dictate that we don’t have to care too much about performance; refreshing our technology platform every 2-3 years safeguards we have the fastest environment”! We must draw our own conclusions as to whether such an observation is somewhat cursory and it’s probably not one shared by seasoned Mainframe professionals with several decades of experience. Where might such a notion come from?

Moore’s law is an observation that over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years or so. Some might say that this trend is slowing down as per this timeframe, namely 2014, where such capacity/performance increases will only happen every three to four years. This is somewhat arbitrary, but one interpretation of Moore’s law might be that refreshing technology every 2-3 years, safeguards from hardware defects and obsolescence, while leveraging from the latest performance and associated functionality. Such an interpretation makes sense from the initial observation of some Mainframe decision maker’s; but is it really that simple?

As always, the devil is in the detail and so life and indeed technology is never that simple. Observing that life is a collection of experiences, those Mainframe professionals with several decades or more experience, will remember the importance of “getting the maximum bang from their buck” and so I/O subsystem tuning was never an afterthought, it was mandatory. The cost of IBM Mainframe DASD storage and more importantly, CPU and memory resource dictated that optimal l/O tuning was the only way to accommodate a workload within a single IBM Mainframe Server footprint, or to complete the batch work in the overnight window, allowing the on-line day to process. An observation for today’s 21^st Century zSeries decision maker, who probably does not have that experience to leverage from should be, “are zSeries I/O subsystem tuning skills still required”? The simple answer is yes; but why?

Nothing in the IT world remains static, change is inevitable and a good thing. Technology upgrades might mask performance problems, with faster z Series CPU chips, memory, FICON channels and DASD resources, but sometimes technology evolution can introduce the opportunity to improve performance, or introduce new anomalies.

Let’s just take a minute to remind ourselves of a fundamental Data Processing (DP) and Information Technology (IT) principle. The data must be moved from fixed storage via the Input/Output (I/O) subsystem to the CPU in order for processing. It therefore follows that badly designed or performing I/O will unnecessarily increase CPU usage, while slowing down associated data delivery. Conversely, a well-tuned I/O subsystem will preserve CPU resource for other more important work, optimizing processing times, while reducing costs. In a world where “doing more with less” is an objective for us all, wouldn’t a great notion be something like “an optimized I/O subsystem might delay a CPU upgrade by 1, 2 or 3 years”?

Each and every zSeries Mainframe environment has subsystems, files, application programs and JCL that must be adjusted and maintained regularly to minimize CPU resource, while optimizing data processing speed. From a business viewpoint, this means optimal consistent batch processing and on-line transactions, which safeguard Service Level Agreement (SLA) and Key Performance Indicator (KPI) metrics, while minimizing TCO.

All IT environments, including the zSeries Mainframe degrade when files grow in size, employees leave or retire (skills/experience are lost), new application programs are introduced, vendor software is upgraded and hardware refreshed. This is the reason we should have a proactive maintenance philosophy. However, maintenance is just not applying the latest software service or application code fixes; it’s also safeguarding that performance is maintained, for each and every interrelated I/O component. When the overall maintenance discipline is not deployed, performance for the zSeries Mainframe environment will inevitably degrade, generating potential business challenges.

CPU resource constraints might be a consequence, for processing tasks (I.E. Batch Jobs, On-Line Transactions, et al) in a reasonable and business facing time metric. Mission Critical on-line transaction response times and batch processing may slow, SLA’s might not be delivered, generating potential business challenges. Therefore accelerating the inevitable upgrade or technology refresh to a larger and faster computing platform will become a necessity, as opposed to a financially sensible and ideally timed asset upgrade. As Einstein once said “insanity is doing the same thing over and over again and expecting different results”! In this instance, technology refreshes or upgrades don’t resolve underlying I/O subsystem issues, they just hide them.

Over the last decade or so, IBM has continued to add increased function and associated performance capability to the zSeries I/O subsystem and related DFSMS product, for example:

FICON Express8S: Incorporating an Application Specific Integrated Circuit (ASIC), designed to support 8 Gb/Second (Maximum Read & Writes – ~52,000 IOs/Second & ~770 MB/Second Throughput).
zHPF: High Performance FICON for System z (zHPF) is a data transfer protocol, optionally deployed for accessing data via advanced disk storage (I.E. IBM DS8000, EMC VMAX, HDS VSP, et al). Combined with FICON Express8S (Maximum Read & Writes – ~92,000 IOs/Second & ~1600 MB/Second Throughput).
System-Managed Buffering: VSAM can use System-Managed Buffering (SMB) to determine the number of buffers and the type of buffer management to use for VSAM data sets.
Data Striping (SAM & VSAM): Allows sequential I/O to be performed for a data set at a rate greater than that allowed by the physical path between the DASD and the CPU (I.E. spreading the data set among multiple stripes on multiple DASD volumes and control units).
MIDAW: The Modified Indirect Data Address Word (MIDAW) facility improves FICON performance, especially when accessing DB2 databases. MIDAW is an optimized I/O (CCW) function for gathering data into and scattering data from discontinuous storage locations during an I/O operation.
Record Level Sharing (RLS): A function that allows VSAM data sets to be fully shared with data integrity among multiple users (E.g. CICS Regions) across multiple systems.

There are other zSeries I/O subsystems improvements not listed here, including compression (E.g. zSeries Host and zEDC) and availability (E.g. VSAM CA Reclaim), but as always, a little knowledge can be a dangerous thing! Although some of these features can be implemented transparently, or implicitly by doing nothing; explicit action is generally advisable, reviewing the current environment, performing before and after implementation benchmarks, sanity checking and measuring the impacts of our changes.

Returning full circle back to that initial observation of the business and financially orientated Mainframe decision maker; who or where is the seasoned z/OS technician that understands the end-to-end I/O subsystem? Of course, such a personnel resource, can deliver tremendous value. Even if said person just concentrates from a business viewpoint, optimizing I/O performance typically generating benefit with reduced processing time and lower operational (TCO) costs. Therefore, why wouldn’t we perpetually tune our I/O subsystems, primarily to safeguard our Mission Critical SLA and KPI metrics?

Generally there is no substitute for experience and this is certainly true of the Critical Path Software Inc. (CPSI) team, headed-up by Robert Burns, Jr. and Ralph E. Bertrum, who have delivered hundreds of successful zSeries I/O subsystem reviews, globally. It was forever thus, CPSI offer a suite of software products, code named Turbo, where the software product can do ~80% of the work, but only the final 20% of work performed by the experienced technician delivers any real benefit to the business. This is where the CPSI team can assist the zSeries Mainframe community, delivering services either in the form of training or professional services to safeguard that their customer derives “maximum bang from their buck”.

The CPSI portfolio incorporates all aspects of zSeries I/O subsystem tuning, for example:

Turbo-Tune: System wide I/O subsystem analysis solution, performing numerous “what if” scenarios, comparing an extensive database of optimization parameters, to deliver the optimum settings on a file-by-file basis.
Turbo-CICS: A CICS Contention Analyzer solution that leverages from the Turbo expert performance tuning database combined with real-life (your) customer CICS statistics to identify and eradicate workload spikes and associated slow response times in CICS (Non-DB2) regions.
Turbo-DB2: A DB2 Contention Analyzer solution that leverages from the Turbo expert performance tuning database combined with real-life (your) customer CICS, DB2 and ICF Catalog statistics to identify and eradicate workload spikes and associated slow response times in CICS/DB2 regions.
Turbo-VS: A managed service to deliver proactive VSAM settings for any zSeries Mainframe user, large or small. V-Source (VS) is a virtual dedicated VSAM expert, covering all of the technologies under the VSAM umbrella including, CICS, DB2, ICF Catalogs, IMS, CICS, JCL, System Files, et al. Optimal VSAM parameter orientated performance tuned libraries are delivered to the zSeries Mainframe user for testing and implementation.

In conclusion, for I/O subsystem tuning, whether deploying a software solution and related consulting professional or not, is not the underlying question, although leveraging from independent 3^rd party experience, is generally a good thing, for obvious reasons. What is of fundamental importance is recognition that the zSeries I/O subsystem does not tune itself and is a vital component for maintaining optimal business performance and related TCO. Ignore this discipline at your peril and don’t rely on technology upgrade/refresh activities to maintain performance. If I/O performance is poor, no amount of leading-edge hardware or software will resolve this underlying issue. As always, zSeries I/O performance management requires expertise and experience, which can be supplemented with software solutions for automation and optimization benefits.

Sometimes there is no substitute for experience, where the bits and bytes experience of working from the 1980’s through to the present day, provides a thorough grounding in how the zSeries Mainframe I/O subsystem works. Software products can help provide data analysis, but only experience of working with many environments can turn this inordinate amount of performance data into meaningful information and associated action plans.

Extended Address Volumes (EAV): Pros & Cons

Posted on October 18, 2012 by zman

It wasn’t too long ago that the maximum size of a 3390 DASD volume was ~54 GB (65,520 Cylinders) via the 3390-54. Then with the release of z/OS 1.10, Extended Address Volumes (EAV) were introduced, and a ~400% increase in single device capacity was delivered @ 223 GB (262,668 Cylinders)! Surely enough storage capacity for anybody?

Of course, we all know that 21^st Century data requirements are significant, and so the release of z/OS 1.13 (z/OS 1.12 and PTFs) has delivered another ~400% increase, with a single device capacity of 1 TB (~1.182 Million Cylinders). However, let’s not forget, data storage capacity can increase by ~20%+ per annum, I guess it won’t be too long before we see another 400%+ increase in size, ~4 TB+…

EAV implementation relieves disk capacity constraints and allows storage growth without adding more devices. In today’s world of TCO optimization and a utopia of very short-term ROI, EAV usage will reduce TCO, primarily personnel and environmental (E.g. Power, Cooling, Floor Space) related. Potentially the ability to manage more data with fewer DASD volumes simplifies the Storage Administration process, therefore increasing the number of TB managed by each technician. Typically, additional capacity (EAV) can be added dynamically, increasing DASD volume capacity online via the Dynamic Volume Expansion (DVE) function.

Theoretically (as per current architectural constraints) a 3390 EAV can grow to 225 TB; the realm of possibility exists!

The pros of EAV implementation seem obvious, a significant capacity increase in a single footprint, easy implementation, with demonstrable TCO benefits; but is all that glisters always gold?

Learning from history is always a good thing and if we consider the challenges of adopting the 3390-9/27/54 device, did we encounter any capacity optimization issues? As a single device increases in size, device occupancy might become a challenge. For example, 90% occupancy of a 3390-54 @ 54 GB is ~48.5 GB, or put another way, ~5.4 GB is allocated but never used. So if we apply the same metric to a 1 TB device, you guessed it, ~100 GB is allocated and never used…

So what they say. Indeed the separation of the physical and logical device eliminates any physical space utilization considerations, but what about the number of data sets and more importantly extents on that EAV or even 3390-54 DASD volume? An issue that has plagued many Mainframe installations is disk fragmentation, as no matter how big a DASD volume, sometimes successful data set allocation is dependent upon sufficient contiguous extents to satisfy primary allocation or secondary extension.

At first glance, the process of defragmentation is very simple, DFSMSdss DEFRAG, FDR/CPK COMPAKTOR, et al, but typically these processes require minimal data set allocation activity and are batch orientated. DASD enqueue time is a consideration, as these traditional Mainframe defrag solutions can generate significant enqueue activity for the VTOC and data sets alike. Can the 21^st Century business that requires near 24*7 data availability allocate sufficient time (E.g. minimal processing window) to perform such manual defragmentation activities? If only defragmentation could be transparent, automated and dynamic…

RealTime Defrag (RTD) is such an option that deploys a multi-faceted approach to delivering “on-line defrag”:

Release – Release allocated but unused space for all data set types
Combine – Combine extents, reducing the number of allocated extents for optimized performance and SE37 abend eradication
Defrag – Reorganize data sets into contiguous groups, increasing size of free extents, optimizing performance and SB37 abend eradication

In conclusion, EAV deployment can only be a good thing, delivering demonstrable TCO benefits in the form of dramatic single-footprint (I.E. Disk Subsystem) capacity increases. RealTime Defrag can also increase service availability, eradicating the requirement for manual and batch orientated defrag activities, while safeguarding that installed disk capacity is optimized, EAV or not.

Value-4IT Blog

zWorld Thoughts & Updates

Tag Archives: DASD

zHyperLink: Just Another System z DASD I/O Function Enhancement?

All Flash & Substance – Is The System z Microsecond The New Millisecond?

System z: Optimizing DASD I/O Subsystem Performance

Revisiting The zSeries Mainframe Storage Hierarchy

Extended Address Volumes (EAV): Pros & Cons