Having worked as a CPT Global mainframe consultant over the past few years, I've learned from some of the top experts in the world and gained valuable insights on effectively managing mainframe resources. One of the most important things I've learned during this time is balancing workloads to lower operating costs.
Project after project, a common issue our clients face is truly understanding their mainframe workloads. Despite knowing when different types of workloads run, many IT shops don't understand the impact these workloads have on their system and how they can be harnessed more efficiently.
Mainframes are highly capable machines. Our job is to direct their behavior. In this article, we look at how to get your workloads under control with a strong discretionary policy, and how this helps improve your mainframe's capacity while saving your business money.
One of the first things we do when assessing a client's mainframe environment is to determine their peak usage times. From there, we map out the workload patterns and identify periods where utilization drops significantly. This is what we call “white space” – periods where your system is underutilized.
Many IT shops have excess capacity without even knowing it – and it's often because businesses tend to focus on consumption spikes and not the space before and after these spikes happen. White space is important because the more we understand it, the more efficiently we can run our mainframes.
Identifying white space isn't difficult. It may occur during off-peak hours, on weekends or even at certain times during the day when there is a period of reduced activity. By recognizing these emerging patterns, we gain a deeper understanding of how your mainframe is being utilized and can make informed decisions on optimizing its efficiency and effectiveness.
Once we have identified the white space in your mainframe workload, it's time to create a discretionary policy that takes advantage of it. This policy should outline how your mainframe resources are allocated and prioritized, particularly in periods of reduced activity.
A discretionary policy is a set of rules or guidelines that determine how your mainframe resources are used during times of high and low demand. It should identify which workloads should run during these periods and which can be delayed until peak usage times.
Mainframers think of discretionary as a way of telling our systems what to de-prioritize when resources are slim. We also use discretionary policies to ‘squash’ spikes in our white space, enabling us to use the excess capacity available after the spikes to process less important or “delayed” workloads.
The first and most obvious benefit of using this approach is that it can protect your capacity during spikes. By telling our system that it’s OK to delay some workloads until resources are more available, we ensure that truly important tasks get completed on time even when our system is running at max capacity.
The second and less obvious benefit is that by following this strategy, you can also save your business hundreds of thousands or even millions of dollars in monthly licensing fees.
Z/OS will always use the resources it's given. One way we can control the capacity and manage costs is by implementing capping. This enables us to artificially limit the system's capacity during peak periods, ensuring predictable usage. By reducing the capacity, we can immediately save on any costs associated with peak usage. Plus, it becomes much easier to predict the maximum charge we'll incur.
When we cap the system and reach that limit, any discretionary workloads will be temporarily delayed to stay within the cap. This way, we can prioritize more important tasks while still managing our resources effectively.
However, understanding our workloads and identifying which tasks can be delayed or put on hold for a short time is crucial. It's not about creating a backlog that sits for hours waiting for capacity to free up. As soon as the system drops below 99.9999% usage, discretionary tasks will receive the necessary resources. The delay might result in these tasks starting a bit later and taking a bit longer to complete, as they only utilize leftover resources, but the result is still the same – getting all critical tasks done.
For example, if we have a task that usually finishes at 6 am but can just as well finish at 10 am without any negative impact, that's a great candidate to start with. Tasks like overnight report processing, user-submitted batch jobs, and copying of SMF data are good starting points to consider.
By understanding our workloads and effectively managing capacity, we can optimize our system's performance and control costs.
Once we identify and assign a task to discretionary, we immediately gain peace of mind. We no longer have to worry about expected or unexpected spikes overwhelming our system during resource-scarce moments, and we don't need to take drastic measures like adding another engine. Just that alone is valuable.
At this point, we can treat discretionary workloads as if they don't exist at all. It's like saving MIPS on a system that hits 100% usage – and if we never reach 100%, it might be time to consider reducing capacity.
A good discretionary policy not only helps us analyze the impact of spikes on discretionary workloads but also allows us to effectively manage system capacity. If spikes reach 200 MIPS, it's time to cap the system by 100 MIPS. However, it's important to proceed in steps, reducing by 50 MIPS at a time and analyzing the impact after each reduction. The objective is to ensure that non-discretionary tasks still meet our desired performance goals.
When Workload Manager (WLM) is properly tuned, the performance indexes should remain stable or close to one. Delays should only be observed in our discretionary workload. As more important tasks experience performance degradation, further capacity capping should be approached cautiously. This approach of capping serves two purposes: reducing costs associated with active capacity usage and assessing the effects of a smaller capacity system.
Over time, teaming a good discretionary policy with an equally strong capping strategy enables us to simulate a smaller system. This gives your business the option to avoid a capacity increase or even reduce capacity with your next upgrade. Beyond the obvious hardware cost avoidance of a larger system, this will also reduce software costs based on installed capacity. Anyone familiar with software costs understands that this is where true savings are to be had.
Reaping the benefits of this simple strategy starts with understanding your workload. While identifying less important workloads and developing a good discretionary policy around them can feel intimidating, it's absolutely achievable – and ultimately results in better performance during resource constraints.
Taken further, capping will reduce costs related to active capacity, giving you the option to avoid capacity increases and even consider capacity reductions. Any reduction in installed capacity naturally leads to further hardware and software cost reductions based on installed capacity, potentially saving your business millions of dollars annually.