Big Data: Is the zSeries Mainframe A Viable Platform?

Noting that ~80% of global corporate data is still managed by IBM Mainframes, doesn’t it make sense that processing this mission critical data should remain local, whenever practicable and pragmatic?

Industry Analyst’s estimate that 90%+ of existing IT budget expenditure is expended on the maintenance of existing applications and their supporting infrastructure. A significant factor is the siloed, duplicated and complex nature of these existing IT environments. Repeating this often unnecessary data duplication and processing for big data implementations will only exacerbate this significant TCO expenditure. Therefore it is of primary importance to consider big data from a strategic rather than a purely expedient tactical perspective. Put another way, if big data could be accessed and processed by the incumbent IBM Mainframe environment, why create another silo environment, requiring more servers, storage, software and associated maintenance expenditure?

It is estimated that each and every day another ~2.5 Exabyte’s (2.5 quintillion bytes) of data is created, meaning that ~90% of electronic data stored, has been created in the last two years alone. This data comes from numerous sources, largely Internet and mobile telephony based, including social media sources, digital pictures and videos, financial transaction records, cell phone generated, naming but a few.

Industry Analyst’s estimate that only ~1% of global data is currently analysed, leaving massive scope for growth in this functional area, namely big data analytics. Obviously this scope dictates exponential and arguably uncontrolled growth in deployment of big data analytics solutions, generating significant risk that big data projects will lack management oversight, spiralling out of control from a cost viewpoint.

It therefore follows that big data initiatives require careful and strategic planning, not only for short-term immediate requirements, but also for future big data projects that can already be perceived and forecasted. Moreover, in addition, there needs to be a strategic, scalable, cost efficient and secure infrastructure in place, managing the interrelationship and interdependencies, between mission critical data stored on the IBM Mainframe and big data being created from Internet and mobile technologies.

Without such a diligent and structured management framework, IT infrastructure expenditure costs (TCO) will increase, efficiency reduce, with the inevitable consequence of siloed environments, with duplication of resources, namely servers, software, storage, et al. As always, we must always apply lessons learned from past experiences to avoid these inefficiencies.

Hadoop is seemingly the big data buzzword, being an open source software framework for storing and processing big data in a distributed environment on large clusters of commodity hardware. Ultimately Hadoop delivers two primary functions, massive data storage and faster in memory I/O processing.

In conclusion, the underlying question remains, can mission critical IBM Mainframe data be “coupled” with big data, typically originating from Internet and mobile platforms, to deliver an integrated single image view of customer and/or product data, for business benefit?

IBM offers an integrated solution, namely the zEnterprise Analytics System (I.E. 9700, 9710), comprising hardware (E.g. z196/zEC12 or z114/zBC 12 Server plus DS8870 Disk) and software (E.g. Optimized z/OS software stack), combined with optional services. Primarily data analytics is delivered by the IBM DB2 Analytics Accelerator solution, incorporating Netezza 1000 product function, allowing for intelligent and rapid in-memory data analytics via the DB2 RDBMS. Therefore existing zSeries Mainframe customers can supplement their current IBM Mainframe infrastructure with the IBM DB2 Analytics Accelerator solution, while the realm of possibility exists for a zSeries Mainframe to be deployed for new workloads, via the zEnterprise Analytics System.

Resource and cost efficiencies are delivered by combining z/OS and Linux on zEnterprise solutions. Data transfer is reduced by keeping data analytics in the same environment as the mission critical source data (I.E. z/OS) using hypersockets to process the data between the IBM z/OS and Linux on zEnterprise systems. Overall TCO efficiencies are delivered by optimizing lower cost Linux on zEnterprise systems resources, where for Sub Capacity z/OS customers, no software charges will be incurred for associated CPU processing. Therefore leveraging from existing zEnterprise infrastructure resources, including people and processes to deploy and support expanding data analytics requirements.

zSeries Mainframe big data analytics solutions, whether via the packaged zEnterprise Analytics System or via the IBM DB2 Analytics solution deliver benefits including:

  • Optimized I/O Processing: Reducing the complexity and cost of data storage and associated processing by bringing data transformation and analytic processes to the data origin (I.E. zSeries Mainframe)
  • Enterprise Wide Data Availability: Safeguarding operational data accessibility to many users in a timely and cost efficient manner without impacting core business processes
  • Near Real Time Data Processing: Delivering near real time operational analytics with minimal latency and superior Quality of Service (QoS) attributes (I.E. RAS – Reliability, Availability, Serviceability)

Syncsort also provide their DMX-h ETL solution to integrate IBM mainframe data with Hadoop technologies. Syncsort DMX-h ETL incorporates a library of Use Case Accelerators to implement common ETL tasks including Mainframe data access, change data capture (CDC), joins, web log aggregations, et al. Implementing a more traditional ETL approach, offloading big data batch workload from the Mainframe to Hadoop platforms, reducing Mainframe MIPS accordingly. Obviously ETL solutions have a long-term history, typically associated with Business Intelligence, Data Warehouse, et al. One must draw one’s own conclusions as to whether ETL solutions contribute to the complexity and cost of managing mission critical business data…

From a business viewpoint, big data analytics delivers benefits, including but not limited to:

  • Optimized & Faster Decision Making: Performing real time analysis of customer transaction and activity data, feedback (E.g. survey and experience) data, et al, can dramatically reduce customer attrition, maintaining existing customer loyalty, applying these lessons learned for attracting new customers.
  • New Products & Services: Customer’s and associated market research have always provided valuable insight into driving innovation, but these traditional processes are time consuming and error prone. Rapidly analysing real life customer data from Internet and mobile sources, delivers an opportunity to offer a new product and/or service, seemingly specialized to their personal individual requirements.
  • Cost Reduction: Performed well, clearly big data analytics can deliver significant cost reduction for the business, reducing product/service development time, while retaining existing customers and attracting new customers. However, done badly, data analytics could be a significant drain on the IT expenditure budget

As always, the zSeries Mainframe delivers an integrated, scalable, secure and cost efficient solution for big data initiatives, even Hadoop, typically perceived as a Distributed Systems solution. Without doubt, big data solutions will be implemented by each and every major global company in the short-term, while pragmatic and careful planning will reduce the associated IT implementation and administration cost. With a legacy of several decades or more delivering enterprise wide solutions, arguably seasoned IBM Mainframe personnel are ideally placed to participate in the design and delivery of big data analytics projects!

IFL – A Cost Efficient zSeries Platform?

In September 2000, IBM introduced the Integrated Facility for Linux (IFL) processor, a specialty engine for and some might say dedicated to running the Linux Operating System.  At the time of this announcement, companion software named S/390 Virtual Image Facility for Linux was introduced to assist in the rapid deployment of IFL configurations, especially for non-Mainframe personnel.  However, this product was quickly discontinued, in favour of the standard z/VM Operating System, which is not difficult to learn and can accommodate hundreds if not thousands of zLinux images.

Today, the IFL is still a processor dedicated to Linux workloads on IBM System z servers.  The IFL is supported by z/VM virtualization and the Linux operating system.  The IFL cannot run other IBM operating systems.  The competitively priced IFL processor is a CPU capacity enabler, exclusively for Linux workloads.  Linux deployment (I.E. SUSE & Red Hat) on IFL’s can reduce expenses in the areas of operational efforts, energy, floor space and especially software.

The IFL provides the following functions and benefits:

  • The IBM Enterprise Linux Server is a dedicated System z Linux server, comprised of only IFL processors
  • No additional IBM software charges for traditional (E.g. z/OS, CICS, DB2, WebSphere, et al) environment
  • Performance improvement for Linux workloads with each successive generation of IFL and System z technology
  • Linux workload on the IFL does not result in increased IBM software charges for traditional System z operating systems and middleware
  • Same functionality as a General Purpose processor on a System z server
  • HiperSockets can be used for communication between Linux images, or Linux and other operating system images on the same System z system
  • z/VM virtualization and most IBM Linux middleware products, plus most vendor software products are priced per processor (core) according to the System z IBM International Program License Agreement (IPLA).  IPLA products have a one-time-charge (OTC) and an annual (optional) maintenance charge, called Subscription & Support
  • Supported by the current z/VM virtualization and IBM Wave for z/VM software versions
  • Always a full capacity processor, independent of the capacity of the other processors in the server
  • Orderable as a System z hardware feature. The number of orderable IFL features varies by the server model and configuration
  • Designed to operate asynchronously with other General Purpose processors
  • Managed by PR/SM in logical partition with dedicated or shared processors. The implementation of an IFL requires a Logical Partition (LPAR) definition, where following normal LPAR activation procedure, LPAR defined with an IFL cannot be shared with a general purpose processor.

There will always be the debate as to which processor and associated server type (E.g. x86, POWER, SPARC) is the most cost efficient, but there is no doubt that the ability to accommodate hundreds if not thousands of zLinux instances in one zServer environmental (E.g. Power, Cooling, Floor Space, et al) friendly footprint, with software pricing per core is worthy of consideration.

Adoption for zLinux has been steady and especially in the emerging territories where it’s not unusual for zSeries deployments to be totally zLinux (I.E. IBM Enterprise Linux Server) based.  Moreover, the majority of large and traditional IBM Mainframe users (I.E. z/OS) have installed at least one IFL, if only to evaluate the z/VM and zLinux offering.  Many have deployed the IFL and associated zLinux solution for business requirements.

Therefore, if one of the major cost benefit features of IFL is optimized software costs; can the IFL processor be considered for other workloads, originating from the traditional zSeries (I.E. z/OS) environments?

Proximal Systems Corporation (PSC) is a company with a solution that transparently offloads data processing from IBM Mainframes to Distributed Systems, with an objective of reducing software cost, while maintaining or improving performance.  The company name is derived from the concept of bringing disparate computing systems into close proximity, functionally speaking, providing totally seamless and transparent interoperability.  The result is a unified computing complex within which various tasks can be easily migrated between systems to their most cost efficient operating environment, while still being able to interoperate as if they were all hosted together on the same system.

The PSC Proxy Coupling Technology allows for a CPU orientated task to be offloaded from one system to another by means of an associated proxy task, which has an identical interface as the task to be offloaded, but delegates the majority of the processing to an offloaded task on another system.  The primary objective of this function are for the cost savings and/or performance improvements that might be delivered by migrating tasks to systems that are able to execute those tasks more efficiently.

The fact that the proxy task maintains the same interface as the application being replaced is crucial; as many past Mainframe migration projects have failed due to insurmountable interoperability problems between the Mainframe and Distributed Systems servers (I.E. Windows, Linux, UNIX, et al).  Proxy Coupling Technology offers a solution to this long-standing challenge.  In theory, this allows for the transparent offload of a traditional z/OS workload (E.g. Sort) from General Purpose (GP) processors, to a less expensive (E.g. IFL) alternative…

In the first instance, the Proxy Coupling Technology offloads General Purpose CPU workload associated with the z/OS sort (I.E. CA Sort, DFSORT, Syncsort) function, to another platform (E.g. IFL).  For IFL based implementations, HyperScokets are utilized to transfer data at memory speeds from the z/OS task to zLinux on the IFL, where the sort operation completes, while the resulting z/OS task and associated data are maintained, as per normal.  From an IFL viewpoint, Ahlsort software performs the sort operation, being a sort solution that maintains compatibility with the majority of z/OS sort function (I.E. Control Card Syntax).  Therefore, this is a transparent implementation, where the only consideration is how much CPU capacity is required for the offload function (E.g. IFL, x86).  The benefits are reduced z/OS MSU usage for the sort function, which can be quite significant, as most business data (E.g. Database Offloads, Customer Orientated, et al) is sorted on a daily if not more frequent basis.

Just as IBM introduced the zAAP on zIIP capability, which allowed some customers to more easily justify a specialty engine (I.E. zIIP), combining workloads to exploit the full capability of the specialty engine; in theory the same ethos applies with the Proxy Coupling Technology.  For the avoidance of doubt, workloads that can be processed on an IFL, such as z/OS sort tasks, can assist in delivering higher Return On Investment (ROI) levels for the IFL, for example:

  • Reduced z/OS WLC MSU usage (I.E. Sort function offload) and associated software costs savings
  • IFL processors run at Full Speed and do not add to traditional workload (I.E. z/OS) software costs
  • Utilize any spare IFL CPU resource not used, releasing General Purpose CPU resource for other work

In conclusion, the Proxy Coupling Technology offers a proposition that is similar to the IBM philosophy of reducing z/OS software costs via specialty engines.  Seemingly to date, primarily only the zIIP and zAAP specialty engines were available to optimize CPU usage for z/OS workloads.  Offloading CPU cycles and thus MSU workload to IFL makes sense, utilizing a cost efficient and indeed a full power CPU engine, where for cost reasons, maybe the majority of z/OS customers don’t deploy the “highest” derivative of General Purpose CPU engine available to them.  On the face of it, the realm of possibility exists for other workloads to benefit from z/OS to IFL CPU offload, following sort, which seems to make sense as the first workload to utilize this solution.