Data Destruction: Is Your IBM Mainframe Mission Critical Data Always Safe?

Social Media Sharing

Data loss incidents expose businesses and their partners, customers and employees to a plethora of risks and associated problems.  Typically, opportunistic, unauthorized or rogue access to sensitive, personal, confidential and Mission Critical data all too often results in identity theft, competitive business challenges, naming but a few, which adversely impact many areas, including but not limited to:

  • Business reputation and perception
  • Monetary via noncompliance penalties and associated litigation
  • Media coverage
  • Personal consumer credit ratings

Unless businesses implement proactive processes to secure data from creation to destruction, vis-à-vis cradle to grave, data loss challenges might ensue.  In fact, millions of individuals are impacted by data loss every year, as criminals increase their sophistication for gaining unauthorized information access.  The increasing dependence on technology and the potential associated collateral damage risk will continue to grow exponentially.  Thus today there is no such thing as the low-risk organization or low-risk personal information and so it follows that business trustworthiness and data security should be a primary concern.

The full and complete destruction and thus secure erasure of data is a mandatory requirement of both Business and Government regulations, in addition to those policies deployed by each and every business.  Regulatory compliance examples include the EU Data Protection Directive, Payment Card Industry Data Security Standard, Sarbanes-Oxley Act, supplemented by many other compliance mandates, encompassing The UK, Europe, The USA and indeed globally.  There are many occasions when data destruction is required, for example:

  • When disks move to another location for reuse or interim storage
  • When a lease agreement matures and disks are returned to the vendor or sold onto the 2nd user market
  • Following a Disaster Recovery (DR) test, where 3rd party disk and tape devices are used for testing purposes
  • The reuse of disk or tape by a different company group
  • Before discarding and thus scrapping disks and tapes that are to leave the Data Centre

Specifically the Payment Card Industry Data Security Standard states:

9.10.2 Render cardholder data on electronic media unrecoverable so that cardholder data cannot be reconstructed; Verify that cardholder data on electronic media is rendered unrecoverable via a secure wipe program in accordance with industry-accepted standards for secure deletion, or otherwise physically destroying the media (for example, degaussing).

Similarly, NIST Special Publication 800-88 (Guidelines for Media Sanitization) states:

Clear; One method to sanitize media is to use software or hardware products to overwrite storage space on the media with non-sensitive data.  This process may include overwriting not only the logical storage location of a file(s) (e.g., file allocation table) but also may include all addressable locations.  The security goal of the overwriting process is to replace written data with random data.  Overwriting cannot be used for media that are damaged or not rewriteable.  The media type and size may also influence whether overwriting is a suitable sanitization method [SP 800-36: Media Sanitizing].

These data erasure (destruction, cleaning, clearing, wiping) methods would be a pre-cursor to supplementary actions such as purging, destroying and disposal of storage media and devices.

Clearly the IBM Mainframe environment is no different to any other, the requirement to safeguard data is always secure is of paramount importance, while being a mandatory requirement.  Each and every Mainframe data centre will from time-to-time complete some or all of the following activities:

  • Replace tape media (E.g. upgrade, damage, replacement activity, et al)
  • Replace disk subsystems (E.g. upgrade, end of lease, replacement activity, et al)
  • Disaster Recovery test (E.g. utilize 3rd party Data Centre for data restoration)

The major consideration to safeguard data security in these instances is when the data and or related storage media is moved off-site, outside of the primary Data Centre infrastructure.  So maybe we should ask ourselves, why is it when we send data electronically, we safeguard the data with encryption, but when the data or related storage media is physically moved outside of the Data Centre, we don’t necessarily apply the same high levels of data security?

There are z/OS guidelines provided for erasing disk data, primarily relating to use of the ICKDSF TRKFMT function with the ERASEDATA and CYCLES parameters.  It is generally accepted that this process is slow and does not erase data to the exacting standard required by regulatory compliance requirements.  Similarly, DFSMSrmm users can use the EDGINERS ERASE function to erase data and even shred encryption keys for cartridge volumes created by high-function cartridge subsystem drives (I.E. TS1120, TS1130), but once again, this process might be considered slow and limited to those users deploying the DFSMSrmm subsystem, when other Tape Management Subsystems are widely deployed (E.g. AutoMedia/ZARA, CA-1, CA-Dynam/TLMS, CONTROL-T, et al).

There are other options available from the ISV market that have been specifically developed to erase data securely, including FDRERASE or SAEerase for disk data and FATS/FATAR for tape data, but wouldn’t it be useful if there was one software product that could erase both disk and tape data for IBM Mainframe environments?

Unlike other competitive solutions that are specialized for one particular storage media, either disk or tape, XTINCT performs a secure data erase for both disk and tape data.  XTINCT meets all the requirements of US Department of Defense 5220.22-M (Clearing and Sanitization Matrix for Clearing Magnetic Disk) by overwriting all addressable locations with a single character.  XTINCT also meets the sanitization requirement by overwriting all addressable locations with a character, its complement, then a random character, followed by final verification.  XTINCT meets the requirements of most users by overwriting the tape and use of the hi-speed data security erase patterns.  It should be noted, for tapes, the DoD only considers degaussing or pulverizing the tape to be a valid erase!

In addition to providing a complete audit trail and comprehensive reports to satisfy regulators, XTINCT surpasses NIST guidelines for cleaning and purging data.  XTINCT also satisfies all federal and international requirements including Sarbanes-Oxley Act, HIPAA, HSPD-12, Basel II, Gramm-Leach-Bliley and other data security and privacy laws.

From a resource efficiency viewpoint, XTINCT is re-entrant and fully supports sub-tasking.  Multiple volumes can be processed asynchronously, whereas other tools, like ICKDSF, run serially.  XTINCT makes extensive use of channel programs, so many functions operate at peak efficiency by only using enough CPU time to generate the channel programs, with the rest of the operation being carried out by the channel subsystem.  This dictates that XTINCT does not overly utilize valuable CPU time.

The method chosen to safeguard that Mainframe disk and tape data is securely erased, destroyed, cleaned, purged, and so on, is somewhat arbitrary, whereas the deployment of the actual process is required, primarily from a business viewpoint, protecting their valuable consumer data in all circumstances, regardless of the mandatory regulatory security requirements.  Ultimately the Mainframe Data Centre just needs to make a minor enhancement to the data management lifecycle model, to guarantee data security in all circumstances, in this instance, when physical data media (I.E. disk, tapes) moves outside of the primary Data Centre location.

IBM Mainframe Capacity Planning & Software Cost Control Interaction?

Social Media Sharing

The cost of IBM Mainframe software is an extensive subject matter that is multi-faceted and can generate much discussion. The importance of optimizing Mainframe software costs is without doubt, as it is the most significant Mainframe TCO component, having increased from ~25-50%+ of overall expenditure in the last decade or so. Conversely Mainframe server hardware costs have largely stabilized at ~15-25% of TCO in the same time period. However, Mainframe Capacity Planning activities have evolved over the last several decades or so, where hardware costs were the primary concern and the number of IBM Mainframe software pricing mechanisms was limited. Of course, in the last decade or so, IBM Mainframe software pricing mechanisms have evolved, with a plethora of acronyms, ESSO, ELA, IPLA, OIO, PSLC, WLC, VWLC, AWLC, IWP, naming but a few!

Can each and every IBM Mainframe user clearly articulate their Mainframe Capacity Planning and Software Cost Control policies, and which person in their organization performs these very important roles? Put another way, not forgetting Software Asset Management (SAM), should there be a Software Cost Control specialist for IBM Mainframe Data Centres…

If we consider the traditional Mainframe Capacity Planning role, put very simply, this process typically produces a 3-5 year rolling plan, based upon historical data and future capacity requirements. These requirements can then be modelled with the underlying hardware (E.g. z10, z114/z196, zEC12) server, identifying resource requirements accordingly, namely number of General Processors (GPs), Specialty Engines (E.g. zIIP, zAAP, IFL), Memory, Channels, et al. Previously, up until ~2005, customer requirements would be articulated to IBM, cross-referenced with LSPR (Large System Performance Reference) and an optimum hardware configuration derived. Since ~2005, IBM made their zPCR (Processor Capacity Reference) tool Generally Available, allowing the Mainframe customer to “more accurately” capacity plan for IBM zSeries servers.

Other enhancements to more accurately determine the ideal zSeries server include sizing based on actual customer usage data generated by the CPU MF facility introduced with the z10 server. CPU MF delivers a refinement when compared with LSPR, refining the zPCR process with real life customer usage data, compared to the standard simulated LSPR workloads.

In summary, the Mainframe Capacity Planning process has evolved to include new tools and data to refine the process, but primarily, the process remains the same, size the hardware based upon historical data and future business requirements. However, what about Mainframe software usage and therefore cost interaction?

Each and every IBM Mainframe user relies heavily on the IBM Operating System (I.E. z/OS, z/VM, z/VSE, zLinux, et al) and primary subsystems (I.E. CICS, DB2, MQ, IMS, et al). Some Mainframe users might deploy alternative database and transaction processing (TP) solutions, but a significant amount of Mainframe software cost is for IBM software products. In the late-1990’s, IBM introduced their PSLC (Parallel Sysplex License Charges), which offered lower aggregate (MSU) pricing for major IBM software products, based upon an eligible configuration (E.g. Resource Sharing). This pricing mechanism had no impact on software cost control, in fact quite the opposite; it was a significant cost benefit to implement PSLC!

In 2000 IBM announced Workload License Charges (WLC), which allowed users to pay for software based upon the workload size, as opposed to the capacity of the machine; thus the first signs of sub-capacity pricing. In 2001, the ability to deploy IBM eligible software on a “pay for what you use” basis was possible, as per the Variable Workload License Charge (VWLC) mechanism. Put very simply, a Rolling 4 Hour Average (R4HA) MSU metric applies for eligible IBM software products, where software is charged based upon the peak MSU usage during a calendar month. The Mainframe user pays for VWLC software based upon the R4HA or Defined Capacity (Sub-Capacity vis-à-vis Soft Capping), whichever is lowest.

From this point forward, and for the avoidance of doubt, for the last 10 years or so, there has been a mandatory requirement to consider the impact of IBM WLC software costs, when performing the Mainframe Capacity Planning activity. One must draw one’s own conclusions as to whether each and every Mainframe user has the skills to know the intricacies of the various software (E.g. IPLA, OIO, PSLC, WLC, et al) pricing models, when upgrading their zSeries server.

With the IBM Mainframe Charter in 2003, IBM stated that they would deliver a ~10% technology dividend benefit, loosely meaning that for each new Mainframe technology model (I.E. z9, z10), a lower MSU rating of 10% applied for the for the same system capacity level, when compared with the previous technology. Put another way, a potential ~10% software cost reduction for executing the same workload on a newer technology IBM Mainframe; so encouraging users to upgrade. However, the ~10% software cost reduction is subjective, because a higher installed MSU capacity dictates lower per MSU software costs…

With the introduction of the z196 and z114 Mainframe servers the technology dividend was delivered in the form of a new software license charge, AWLC (Advanced Workload License Charges), where lower software costs only applied if this new pricing model was deployed. A similar story for the zEC12 server, the AWLC pricing model is required to benefit from the lower software costs! If these software pricing evolutions were not enough, in 2011 IBM introduced the Integrated Workload Pricing (IWP) mechanism, offering potential for lower software pricing based upon workload type, namely a WebSphere eligible workload. Finally, and as previously alluded to, as MSU capacity increases, the related cost per MSU for software decreases, so there are many IBM software pricing mechanisms to consider when adding Mainframe CPU capacity. So once again, who is the IBM Mainframe Software Cost Control specialist in your organization?

For sure, each and every IBM Mainframe user will engage their IBM account team as and when they plan a Mainframe upgrade process, but how much “customer thinking is outsourced to IBM” during this process? Wouldn’t it be good if there was an internal “checks & balances” or due diligence activity that could verify and refine the Mainframe Capacity Plan with IBM software cost control intelligence?

Having travelled and worked in Europe for 20+ years, I know my peers, colleagues and friends that I have encountered can concur with my next observation. The English and Americans might come up with a good idea and perhaps product, the French are most likely to test that product to destruction and identify numerous new features, while the Germans will write the ultimate technical manual…

zCost Management are a French company that specializes in cost optimization services and solutions for the IBM Mainframe. From an IBM Mainframe Capacity Planning & Software Cost Control Interaction viewpoint, they have developed their CCP-Tool (Capacity and Cost Planning) software solution. This software product bridges the gap between Mainframe Capacity Planning for hardware and the impact on associated IBM software (E.g. WLC, IPLA, et al) costs.

CCP-Tool facilitates medium-term (E.g. 3-5 year) Mainframe Capacity Planning by controlling Monthly License Charges (MLC) evolution, generating cost control policies, optimizing zSeries (E.g. PR/SM) resource sharing, delivering financial management via IBM Mainframe software cost control activity. CCP-Tool integrates with existing data and activities, using SMF Type 70 & 89 records, defining events (I.E. Capacity Requirements, Workload Moves) in the plan, simulating many options, delivering your final capacity plan and periodically (I.E. Quarterly) reviewing and revising the plan. Most importantly, CCP-Tool deploys many algorithms and techniques aligned to IBM software pricing mechanisms, especially WLC and R4HA related.

Therefore CCP-Tool delivers a financial management framework via a medium-term Capacity Plan with associated software cost control and zSeries (E.g. PR/SM) resource policies. This enables a balanced viewpoint of future Data Centre cost configurations from both a hardware and related IBM Mainframe software viewpoint. Moreover, for those IBM Mainframe users that don’t necessarily have the skills to perform this level of Mainframe cost control, CCP-Tool delivers a low cost solution to empower the Mainframe customer to engage IBM on an equal footing, at least from a reporting viewpoint. Similarly, for those Mainframe users with good IBM Mainframe software cost control skills, CCP-Tool offers a “checks & balances” viewpoint, delivering that all important due diligence sanity check! Quite simply, CCP-Tool simplifies the process of reconciling the optimal configuration both from an IBM Mainframe hardware and related software viewpoint.

Without doubt, if a Mainframe user still deploys a hardware centric viewpoint of the capacity planning activity, without considering the numerous intricacies of IBM Mainframe software pricing, in most cases, this could be a significant cost oversight. Put very simply, a low-end IBM Mainframe user of ~150 MSU (1,000 MIPS) might spend ~£1,000,000 per annum, just for a minimal configuration of z/OS, CICS, COBOL and DB2 software, so one must draw one’s own conclusions regarding the potential cost savings, when deploying the optimal zSeries hardware and associated IBM software configuration. I paraphrase Oscar Wilde:

“The definition of a cynic is someone that knows the price of everything, and the value of nothing!”

So, let’s reprise. You have performed your Mainframe Capacity Planning activity, considered historical SMF data for CPU usage, maybe including the R4HA metric, factored in additional new and growth business requirements, refined the capacity plan by using the zPCR tool, perhaps with data input from CPU MF and you now have identified your optimum zSeries Mainframe server?

Maybe you should think again, because the numerous IBM MLC software pricing mechanisms could impact your tried and tested Mainframe CPU hardware planning process. Firstly, for MLC software, the unit cost per MSU reduces, as the installed MSU capacity increases. In simple terms, this encourages the use of “large container” processing entities, LPARs and CPCs. Other AWLC and IWP related considerations further encourages the use of major subsystems (E.g. CICS, DB2, WebSphere, IMS) in larger MSU capacity LPARs and CPCs to benefit from the lowest unit cost per MSU. Additionally, do you really need to run all software on all processing entities? For example, programming languages (E.g. COBOL, PL/I, HLASM, et al) are not necessarily required in all environments (E.g. Test, Development, Production, et al). It is not uncommon for compile and link-edit functions to be processed in Development environments only, while only run-time libraries are required for Production. These “what if” scenarios generated by the numerous IBM MLC software pricing mechanisms must be considered, ideally by an internal resource, with the requisite skills and experience.

Today, who is performing this Mainframe Software Cost Control in your organization? Is it an internal resource with the requisite skills, an independent 3rd party, IBM or nobody? One must draw one’s own conclusions as to whether any of these parties who could perform this vital activity has a vested interest or not, and thus a potential conflict of interests…

What is my RACF technical policy? Could it be NIST DISA STIG based?

Social Media Sharing

In the late 1980’s I was lucky enough to work at a Mainframe site that was performing early testing for MVS/ESA and DFP Version 3, namely DFSMS. So what you say! The main ethos of DFSMS was in fact System Managed Storage and the ability to define policies to manage data, and the system (DFSMS) would implement these policies. Up until this point, there was no easy way of controlling data set allocation and managing storage space.

Conversely, the three mainstream Mainframe security subsystems, in no particular and so alphabetical order, ACF2, RACF (Security Server) and Top Secret have always had the ability to define a security policy and for said policy to be processed as and when the associated resource was accessed. So why is it that so many security risk registers are full of “things to do” from a security viewpoint? Where did it all go wrong?

In the late 1980’s, Guide and SHARE, in Europe anyway, were separate entities, and these organizations had some influence with IBM regarding direction for various IBM Mainframe technologies. Not much has changed as of today, but the organizations have merged and for Europe, now we have GSE. From a DFSMS viewpoint, there was a significant amount of user input regarding how DFSMS might be shaped, and the “System Managed Storage” ethos. I wonder whether such user input, or indeed focussed collaboration from Mainframe security gurus might help with the RACF or Mainframe security technical policy challenge?

Having worked with multiple IT security focussed organizations (E.g. NIST, DoD) over the last few decades, Vanguard Integrity Professionals has been actively involved in creating and evolving NIST DISA STIG Checklists. These checklists (currently 300+ checks and growing steadily) provide a comprehensive grounding for z/OS RACF policy checking, and seemingly are gaining momentum in being accepted as a good starting point to assist organizations define and monitor their z/OS RACF (ACF2 & Top Secret in the near future) policy.

These DISA STIG checklists contain step-by-step instructions for customer usage to ensure secure, efficient, and cost-effective information security that is fully-compliant with recognized security and therefore Mainframe security standards. Being fully compliant with DoD DISA STIG for IBM z/OS Mainframes, these checklists provide organizations with the necessary procedures for conducting a Security Readiness Review (SRR) prior to, or as part of, a formal security audit.

To increase automation and thus reduce cost, Vanguard has optimized the DISA STIG checking process with their Configuration Manager solution. Configuration Manager can perform Intrusion Detection DISA STIG checks and report findings in just a few hours instead of the hundreds or thousands of hours it may take using standard methods. Potentially, Configuration Manager enables organizations to easily evolve from continuous monitoring to periodic compliance reporting.

Maintaining tight control over the security audit and compliance process is a critical imperative for today’s enterprises. To comply, enterprises must show that they have implemented procedures to prevent unauthorized users from accessing corporate and personal data. Even if enterprises have the means to efficiently conduct audits, they often lack the tools necessary to prevent policy and compliance violations from reoccurring. As a result, security vulnerabilities remain a constant threat, exposing companies to potential sanctions and erode the confidence of investors and customers.

As a result, the process of meeting compliance standards such as those found in the Combined Code issued by the London Stock Exchange (LSE) and the Turnbull Guidance (the Sarbanes-Oxley equivalent for publicly traded companies in the UK), the Data Protection Act 1998 (and, for the public sector, the Freedom of Information Act 2000), the regulations promulgated by the Financial Services Authority (FSA) (the FSA has oversight over the various entities that make up the financial services industry), standards set by Basel II, the Privacy and Electronic Communications Regulations of 2003, the HMG (UK Government) Security Policy Framework, the Payment Card Industry Data Security Standard (PCI DSS) and various UK criminal and civil laws, represents one of IT’s most critical investments.

As a consequence, managing security in the Mainframe environment is becoming an increasingly difficult task as the list of challenges grows longer every day. Even the most experienced Security Administrators can labour under the workload as security systems increase in size and networks grow in density.

So where does the Mainframe Security Administrator start to make sense of how to achieve security compliance for their particular business?

Although there are many security and compliance regulatory requirements with supporting policy frameworks, none of these high-level mandates actually drill-down to the technical level and thus provide RACF or equivalent Mainframe (ACF2, Top Secret) policy guidelines!

There are synergies between various global organizations that define security standards. This is certainly true for NIST and ISO/IEC. Page viii of the NIST SP-800-53 policy states:

NIST is also working with public and private sector entities to establish specific mappings and relationships between the security standards and guidelines developed by NIST and the International Organization for Standardization and International Electrotechnical Commission (ISO/IEC) 27001, Information Security Management System (ISMS).

Furthermore, the seemingly ubiquitous Annex A of ISO 27001 is also cross-referenced by this NIST SP-800-53 policy, and the various controls and monitoring points are cross-referenced, where largely, the requirements of the NIST standard are mapped in the ISO/SEC standard, and vice versa. Please refer to Appendix H of the NIST SP-800-53 policy, which cross-references NIST SP-800-53 with ISO/IEC 27001 (Annex A) controls.

Put simply, one must draw one’s own conclusions as to the robustness of NIST SP-800-53 vs. ISO/IEC 27001, but both security standards seem to have commonality as robust and acceptable security standards. So maybe Mainframe users all over the world can define and deploy a generic and robust baseline Mainframe technical security policy, vis-à-vis, the NIST DISA STIG checklists…

We must also recognize the IBM Health Checker for z/OS, which also has the ability to perform automated RACF policy checks. This facility includes some ready-to-go checks for standard system services, in conjunction with a facility that allows the user to define their own policy checking rules. Without doubt, the IBM Health Checker for z/OS is a worthy resource that should be leveraged from, but for RACF, if each Mainframe user defines their own policy checking rules, maybe there is the possibility for a significant duplication of effort. For the avoidance of doubt, although RACF resource naming standards might be unique to each and every Mainframe user, there is a commonality of ISV (E.g. ASG, BMC, CA, Compuware, IBM, SAS, et al) software subsystems and products they deploy. If only we could all benefit from previous lessons learned by “standing on the shoulders of giants”!

Perhaps the realm of opportunity exists. There are many prominent Mainframe security giants actively involved today, including the authors of ACF2, Vanguard Software, Consul (Tivoli zSecure), naming but a few. Is it possible that there could be one common standard that might be used as a technical policy template, based upon the ubiquitous 80/20 rule? So deploying this baseline would deliver 80% of the work required for 20% of the effort, where the unique Mainframe customer just customizes this policy as per the resource naming standards in their Mainframe Data Centre? Equally the user has the ability to contribute to this template, perhaps using a niche software product not commonly used that requires security policy checks, where said software product is deployed in maybe tens of Mainframe customers globally.

Vanguard clearly has put a lot of effort into evolving the DISA STIG resource for Mainframe, IBM also has their RACF Health Checker, but what about one overseeing independent organization, which could benefit from the experience of Mainframe security specialists, and moreover, real-life field experience from Mainframe users globally, implementing and refining these standards. Wasn’t that the essence and spirit of Guide and SHARE several decades ago, listening to Mainframe users and evolving Mainframe technology accordingly? Of course SHARE in The USA and Guide Share in Europe, IUGC and APUGC in the APAC region still perform this function admirably, but seemingly with the NIST DISA STIG resource, we already have a great baseline to leverage from.

What is the size and shape of this potential task, ideally to identify each z/OS software product that has specific interaction with the security subsystem (I.E. ACF2, RACF, Top Secret), typically via resource profiles? For a software z/OS product to be developed, the ISV will have interacted with IBM, initially via their PartnerWorld resource for product development, and eventually via the IBM Global Solutions Directory from a Marketing viewpoint. As of Q4 2012, the IBM Global Solutions Directory contains ~1,800 ISV’s with z/OS based software products.

However, recognizing there are already good security resource checklist templates in existence, vis-à-vis the solid foundation primarily provided by Vanguard via the DISA STIG checklists, the best organization to add to these DISA STIG checklists are the ISV’s themselves. The ISV has most knowledge about their product, having written the code and supporting documentation for security related control; so a modicum of effort from the ISV that has product specific security resource checking seems the best way forward.

In 2003 IBM launched a Mainframe Charter initiative, demonstrating their commitment to the Mainframe platform, where they adopted nine principles organized under the pillars of innovation, value, and community. Although this was an IBM initiative, wouldn’t it be great for the Mainframe ISV to proactively be part of this global Mainframe community, and assist their Mainframe customers simplify the activity of implementing and monitoring their Mainframe security technical policy? Not every ISV will have software products with specific security resource interaction, and therefore not every product in the ISV software portfolio will require a security checklist. The amount of work per software product to create a template might only be several hours, and so could the ISV produce these checklists as part of their day-to-day customer support activities?

Is it possible that globally, we can all participate and collaborate in a focussed and Mainframe security centric group, to define a technical policy template that will assist all Mainframe customers satisfy regulatory compliance mandates? No one of us is as good as all of us…

Mainframe Virtual Tape: Tape On Disk; But For How Long?

Social Media Sharing

By definition, a Virtual Tape Library (VTL) solution uses a disk cache to store tape data files, but for how long is this data retained on disk? Is it minutes, hours, days, weeks or indefinitely? Only business requirements can dictate the time period tape data is stored on disk, which will influence the VTL solution chosen. We will return to this pivotal question later in the article…

Some might say (for some reason I’m thinking of an Oasis lyric) that Mainframe Virtual Tape choice is as simple as black and white; or blue (IBM) and red (Oracle AKA StorageTek). Hmmm, clearly this is not the case; there are grey areas, but moreover, there are many colours to choose from. For sure we must recognize the innovation in tape technologies by StorageTek, delivering the 1st Automated Tape Library (ATL, NearLine) and IBM with the first Virtual Tape Library (VTL, VTS), naming but a few. Of course, now I recall, IBM delivered VTS in the mid-1990’s, about the same time as that Oasis song!

There is also that age old debate as to whether tape is dead or not and the best compromise always seems to be, “we’ll have to agree to disagree”, depending upon your viewpoint. Does it matter?

I also recall the early 1990’s, where Mainframe disk was proprietary and based upon 1:1 mapping, a physical disk was the addressable DASD volume. The promise of Iceberg (AKA SVA) from StorageTek and the delivery of Symmetrix by EMC changed this status quo, and so the Mainframe world adopted logical to physical mapping for disk storage, via RAID technologies, with Just a Bunch Of Disks (JBOD). This was significant, as the acquisition cost per MB for Mainframe disk was ~£5 (yes that’s right, I’m a Brit, so GBP), and today, maybe ~£0.01 (1 Penny) per MB, or ~£10 per GB, and getting lower each year. So yes, tape is always less expensive when compared with disk, by significant magnitudes, but the affordability of disk indicates that it can now be seriously considered, for backup and archive data.

As with any technology decision, it should be business requirements that drive the solution chosen, and not an allegiance to a storage media type, tape or disk, or a long time Mainframe tape vendor, IBM or Oracle. Ultimately there is only one thing that differentiates one business from another, and that is the data itself, stored in whatever format, databases, application code libraries, batch flat files, et al. Therefore the cost of storage is somewhat arbitrary; it’s the value of the business data that we should consider, while recognizing capital expenditure and TCO running costs.

The 21st century business seemingly requires near 24*7 service availability and if that business deploys a zSeries (~zero downtime) Mainframe server, I guess we can presume that said business requires near 24*7 data availability. We then must consider Business Continuity and associated Disaster Recovery metrics, which are measured by the Recovery Time Objective (RTO) and Recovery Point Objective (RPO). Ultimately these RTO and RPO values will dictate the required Backup & Recovery and Archive solutions required, where Recovery (time) is the most important factor!

When was the last time you performed a completely successful Disaster Recovery test from a secondary (physical tape, virtual tape disk) copy of data and was the Recovery Time Objective (RTO) satisfied? Was this a complete workload test, where you included on-line, batch and backup (VTL) testing?

From a data categorization viewpoint, industry analysts tell us, if we didn’t know this fact ourselves, that the majority of Mission Critical data is stored in database structures. If we associate other data types with said databases, application code to process the data, policies to manage and safeguard the data and processes to secure and preserve the data, then I guess we have many instances of Mission Critical data.

As the cost of disk has reduced, so has the cost of network bandwidth, so it’s not uncommon for Mainframe customers to mirror/replicate their data between Geographically Dispersed (E.g. GDPS, GDDR) data centres. They deploy this significant investment solution because they have a requirement for near 24*7 service and thus data availability. Therefore their RTO is likely measured in Minutes (E.g. ~5-15), not because the underlying technology can’t deliver a near instantaneous switch, but because the data needs a Point of Consistency (PoC), and this is the “latency time” for delivering a meaningful RPO (E.g. Pre Batch, Post Batch). Mission Critical databases need to establish a Quiesce PoC, to safeguard data consistency.

If the Mainframe user implements this high availability solution for their primary data copy, why wouldn’t they do this for their secondary (E.g. Backup, Archive) data copy? Ultimately there is generally a hierarchy of RTO and RPO objectives, associated with physical and logical failures. A mirrored disk environment only provides rapid recovery (RTO) for a physical component failure, while a logical data failure will manifest itself for all data copies in the mirror topology. Therefore we always have to consider what is our last line of defence for data recovery; typically a secondary backup data copy. Clearly recovering data from a backup, even a disk based backup, generates a significantly higher recovery (RTO) elapsed time. We might also consider data consistency for this backup data copy; namely, has the backup data been completely destaged/written to the target storage device, tape or disk? Of course, if we don’t have a good backup, we can’t recover the data!

OK, we have come full circle to that original question, by definition, a Virtual Tape Library (VTL) solution uses a disk cache to store tape data files, but how long is this data retained on disk? Is it minutes, hours, days, weeks or indefinitely? Only business requirements can dictate the time period tape data is stored on disk, which will influence the VTL solution chosen.

VTL solutions can be classified as either traditional or tapeless. Traditional is a combination of physical drives and cartridge media in an ATL with a Virtual Tape disk cache (usually proprietary) that is destaged periodically to physical cartridge media, where the primary suppliers are of course IBM with their TS7700 family and Oracle with their VSM offering, while Fujitsu have their CentricStor offering. Tapeless VTL solutions are typically FICON/ESCON channel attached appliances to a back-end disk cache (typically IP, FC or iSCSI), where the tape data is permanently stored on disk. Because the back-end disk cache can be any disk subsystem, within reason, the disk acquisition cost is optimized, because it’s classified as Enterprise/Distributed disk, as opposed to Mainframe disk.

There are many suppliers of tapeless VTL solutions, but the primary vendors are EMC with their Disk Library for Mainframe (DLm) offering and HDS with a several layered approach including LUMINEX Gateways and HDS disk. EMC recently acquired Bus-Tech, where DLm is an OEM of the Bus-Tech MDL solution, still available via the EMC Select option. IBM, Oracle and Fujitsu also offer tapeless VTL solutions, as and if required, but generally they’re deployed in combination with their traditional physical tape based VTL/ATL offerings. There are also software options, IBM Virtual Tape Facility for Mainframe (VTFM) and CA Vtape, where these software solutions deploy higher cost Mainframe disk as the virtual tape cache.

The majority of VTL solutions benefit from data dedupe functionality, where IBM incorporates their ProtecTIER technology, EMC and HDS incorporate DataDomain technology, while Oracle does not currently support Mainframe dedupe, incorporating a Virtual Library Extension (VLE) as a second tier of VTL disk storage. Ultimately dedupe delivers significant (~10-20:1) data reduction benefits and arguably is mandatory for any large scale Mainframe VTL implementation.

Each and every business must draw their own conclusions for VTL implementations and whether they should be tapeless or not. Most Mainframe users have experienced the benefits of mirrored disk (I.E. IBM PPRC, EMC SRDF, HDS TrueCopy, XRC, et al) and have implemented high-availability solutions with a short-term RTO for physical failures. However, only that business can consider how robust their data recovery processes are for logical data failures, and in the worst case scenario, restoring an entire Mission Critical application from a backup copy. The driving factor for this type of recovery is RTO and where is that “last chance” backup data copy stored, tape or disk storage media, and local, remote or 3rd party data centre?

Just as the business must establish a 1st level RPO and associated RTO for their Mission Critical database structures, typically via a quiesce Point of Consistency (PoC), they must do the same for their 2nd level backup data. If a VTL destages data from disk cache to physical tape, then the time required to create the final physical tape copy will influence the associated RTO, and potentially how much data loss might occur. For the avoidance of doubt, if backup data cannot be detstaged to physical tape, then the backup has not been completed, and is unusable. Ultimately data loss is not acceptable, whether a database, or a backup copy. So what steps can the Mainframe user take to minimize this risk?

Because tapeless VTL solutions can attach to any disk subsystem, within reason, IT departments generally have their preferred disk supplier and associated processes. Data dedupe significantly reduces disk acquisition cost and associated network transmission costs, while the functional abilities of disk subsystems are typically higher (I.E. Mirroring, Replication) and more robust when compared with tape subsystems.

If the typical Mainframe user has confidence in their disk mirroring solution for physical failure scenarios, generally associated with the primary copy of Mission Critical data, it seems a logical conclusion that they could extend this modus operandi to secondary (E.g. Backup) copies, eradicating if not eliminating any data loss concerns.

If the Mainframe user deploys EMC Symmetrix (VMAX) for disk data, they could deploy the DLm 8000 VTL to benefit from SRDF/GDDR functionality; if they deploy HDS USP, they could deploy LUMINEX gateways to benefit from TrueCopy functionality, and so on. There are many options available, when the front-end host connectivity (E.g. FICON, virtual tape drives) is separated from the back-end data store (E.g. IP/FC/iSCSI disk).

Additionally, the smaller Mainframe user that cannot afford hot/warm site recovery facilities can also consider different options for Disaster Recovery solutions. For example, they could deploy a tapeless VTL in their only data centre, benefitting from data dedupe for data reduction, transmitting their backup/archive data via IP (or other network transmission) into a 3rd party suppliers facilities, duplicating the VTL and disk subsystems to store the data. They can then modify their Disaster Recovery (DR) procedures to invoke DR as and when required, at that point connecting the 3rd party Mainframe resources to the VTL and data recovery can start immediately. Therefore the traditional off-site DR test at 3rd party provider premises increases in efficiency, while backup data availability is not reliant on the Ford Transit Access Method (FTAM)!

So, how long should secondary copies of Mission Critical data be retained on Virtual Tape disk? Is it minutes, hours, days, weeks or indefinitely? The jury might still be out, but to deliver near 24*7 data availability, for both logical and physical failure scenarios, seemingly at least one secondary copy of Mission Critical data should be retained indefinitely on Virtual Tape disk…

Extended Address Volumes (EAV): Pros & Cons

Social Media Sharing

It wasn’t too long ago that the maximum size of a 3390 DASD volume was ~54 GB (65,520 Cylinders) via the 3390-54.  Then with the release of z/OS 1.10, Extended Address Volumes (EAV) were introduced, and a ~400% increase in single device capacity was delivered @ 223 GB (262,668 Cylinders)!  Surely enough storage capacity for anybody?

Of course, we all know that 21st Century data requirements are significant, and so the release of z/OS 1.13 (z/OS 1.12 and PTFs) has delivered another ~400% increase, with a single device capacity of 1 TB (~1.182 Million Cylinders).  However, let’s not forget, data storage capacity can increase by ~20%+ per annum, I guess it won’t be too long before we see another 400%+ increase in size, ~4 TB+…

EAV implementation relieves disk capacity constraints and allows storage growth without adding more devices.  In today’s world of TCO optimization and a utopia of very short-term ROI, EAV usage will reduce TCO, primarily personnel and environmental (E.g. Power, Cooling, Floor Space) related.  Potentially the ability to manage more data with fewer DASD volumes simplifies the Storage Administration process, therefore increasing the number of TB managed by each technician.  Typically, additional capacity (EAV) can be added dynamically, increasing DASD volume capacity online via the Dynamic Volume Expansion (DVE) function.

Theoretically (as per current architectural constraints) a 3390 EAV can grow to 225 TB; the realm of possibility exists!

The pros of EAV implementation seem obvious, a significant capacity increase in a single footprint, easy implementation, with demonstrable TCO benefits; but is all that glisters always gold?

Learning from history is always a good thing and if we consider the challenges of adopting the 3390-9/27/54 device, did we encounter any capacity optimization issues?  As a single device increases in size, device occupancy might become a challenge.  For example, 90% occupancy of a 3390-54 @ 54 GB is ~48.5 GB, or put another way, ~5.4 GB is allocated but never used.  So if we apply the same metric to a 1 TB device, you guessed it, ~100 GB is allocated and never used…

So what they say.  Indeed the separation of the physical and logical device eliminates any physical space utilization considerations, but what about the number of data sets and more importantly extents on that EAV or even 3390-54 DASD volume?  An issue that has plagued many Mainframe installations is disk fragmentation, as no matter how big a DASD volume, sometimes successful data set allocation is dependent upon sufficient contiguous extents to satisfy primary allocation or secondary extension.

At first glance, the process of defragmentation is very simple, DFSMSdss DEFRAG, FDR/CPK COMPAKTOR, et al, but typically these processes require minimal data set allocation activity and are batch orientated. DASD enqueue time is a consideration, as these traditional Mainframe defrag solutions can generate significant enqueue activity for the VTOC and data sets alike. Can the 21st Century business that requires near 24*7 data availability allocate sufficient time (E.g. minimal processing window) to perform such manual defragmentation activities? If only defragmentation could be transparent, automated and dynamic…

RealTime Defrag (RTD) is such an option that deploys a multi-faceted approach to delivering “on-line defrag”:

  1. Release – Release allocated but unused space for all data set types
  2. Combine – Combine extents, reducing the number of allocated extents for optimized performance and SE37 abend eradication
  3. Defrag – Reorganize data sets into contiguous groups, increasing size of free extents, optimizing performance and SB37 abend eradication

In conclusion, EAV deployment can only be a good thing, delivering demonstrable TCO benefits in the form of dramatic single-footprint (I.E. Disk Subsystem) capacity increases.  RealTime Defrag can also increase service availability, eradicating the requirement for manual and batch orientated defrag activities, while safeguarding that installed disk capacity is optimized, EAV or not.