The changes to Daylight Savings Time (DST) in the United States this year raised questions about the way time is handled by various software programs.
The implications of the DST change for WebSphere MQ are fairly straightforward. In summary, there are no issues with the WMQ runtime (on either the servers or clients) as WebSphere MQ uses UTC and is essentially oblivious to DST. There are some issues surrounding the Java runtime environments supplied with WebSphere MQ. This is all well documented in a TechNote on IBM.com.
Despite this, the DST changes this year were a good reminder of the implications of making changes to the system clock on servers. In this post I will discuss some implications with WebSphere MQ.
Duplicate MsgIds, CorrelIds, GroupIds, ConnIds
WebSphere MQ generates unique identifiers for a number of values, such as the unique message identifier, MsgId. Although they aren’t straightforward timestamps, part of the value is generated based upon a timestamp in UTC. As such, changes to the system clock does make it possible that the queue manager will generate values for these identifiers which have been previously used.
This should not cause any errors within the queue manager. WebSphere MQ allows applications to generate identifiers externally to the queue manager, which therefore may reuse identifiers (although this is something that we strongly discourage). As a result, duplicate identifiers have never been something that we could absolutely prevent, and therefore are generally able to handle.
It is possible that duplicate identifiers may cause problems in the logic of some WebSphere MQ applications which rely on these IDs. For example, it may cause applications to get an incorrect reply message, or for message groups to contain incorrect members, or message sequence numbers to be reused. How significant a problem this will be is entirely dependent on the design of the WebSphere MQ application, and how it handles these values.
WebSphere MQ Publish/Subscribe stores information on SYSTEM.BROKER queues, and uses MsgIds to retrieve it. As such, it is possible that duplicate message identifiers could cause errors in the WebSphere MQ Publish/Subscribe Broker.
Consider a message which has a specified time-to-live after which it becomes eligible to be discarded (if it has not already been got from the destination queue). In these cases, the queue manager stores the time that the message arrives, and uses this to perform comparisons with the current time to identify if a message should expire.
If the system clock changes after such a message is put, then it is possible that messages may be expired too soon, or not expire when intended. This may cause a problem for applications which rely on message expiry.
With persistent messages, the stored arrival time will be written to disk and will be restored after the restart of a queue manager. Restarting a queue manager after changing the system time will therefore not resolve any such problems.
MQGET with WAIT
Consider a WebSphere MQ application which performs an MQGET and specifies that it wants to wait for a message for a period of time before timing-out. If the system clock changes after the MQGET is issued, but before a message is got or the time-out occurs, then the time change may cause some applications to remain in the MQGET call for longer or shorter than was intended.
This is unlikely to cause a significant problem in most instances, however this is again dependent upon the application design. Restarting the queue manager would ensure that any possible issues for this particular problem are prevented.
It is worth noting that waits will generally last for the requested interval regardless of time changes. WMQ typically requests notification from the operating system after a specified interval rather than at a specific end-time. As a result, WMQ’s behaviour in this regard will ultimately be dependent upon different operating systems’ handling of intervals in the event of time changes. However, this behaviour cannot be guaranteed and should not be relied upon.
TriggerInterval is a queue manager attribute used to restrict the number of trigger messages. It is intended to allow for a queue server that ends before processing all the messages on the queue. The purpose of the trigger interval is to reduce the number of duplicate trigger messages that are generated.
Changes to the system clock during triggering may cause the trigger interval to be generated too early or too late. Whether this causes a problem depends on how significantly TriggerInterval is being relied upon in a given environment.
The trigger interval is reset when a queue manager is restarted, so restarting the queue manager after the system time is changed will avoid any possibility for problems in this area.
Batch heartbeats allow sender channels to determine whether the remote channel instance is still active before going indoubt.
Changes to the system clock could cause an extra heartbeat to be generated or for one to be missed. It is unlikely that this would cause a problem, however restarting affected channels should be sufficient to resolve any issues that arise.
Changing the system clock during an in-flight batch could cause a batch to be submitted earlier (if the clock is moved forwards) or later (if the clock is moved backwards) than intended.
Whether this causes a significant problem will depend on how much the batch interval is relied upon to ensure that batches are committed. For example, in an environment where BatchSize is set so high that it is never reached, then moving the system clock backwards a long way could result in a notable wait before messages are committed. Typically, however, most customers have BatchSize set to a value that is sufficient to avoid this.
This value will be reset when a channel is restarted, so restarting affected channels should be sufficient to resolve any issues that arise.
Queue service intervals
Queue service interval events indicate whether a queue was ‘serviced’ within a user-defined time interval. Changing the system time could affect this function – such as causing the queue manager to generate unnecessary queue service interval events if the clock is moved forwards (far enough to cause it to mistakenly think that the queue has not been serviced for too long), or to fail to generate an event if the clock is moved backwards.
Whether this causes a problem will depend on how these events are being handled and used, however it is likely that unexpected queue service interval events will be relatively easy to match up with a server time change.
WebSphere MQ v6 introduced the collection of a number of new statistics fields, such as QTIME – used to indicate the length of time that messages are staying on a queue. The queue manager reports these values based upon differences between timestamps. As such, in the event that the system time is changed, it is possible that these monitoring values may no longer be reliable.
These values are provided to aid user administration and problem diagnosis, and are not relied upon by the queue manager. As such, it is unlikely that this would cause any significant problems.
There are places within WebSphere MQ where timestamps are collected for displaying to the user, such as the creation and last-alteration date of queue managers and queue manager objects. These values are not used by the queue manager for processing, so should not cause any problems other than potentially misleading or confusing a system administrator with values which may appear to be incorrect.
Avoiding problems in the first place
The best practice is to avoid any changes to the system clock in the first place. Once a system has become confused because of changes to the time, there may be no clear way to resolve the situation (e.g. to identify what how messages with duplicate message identifiers should have been handled).
Where possible, changes to timezones are preferable to changes to the underlying system clock – as these are not subject to the problems outlined in my post. This is because the WMQ runtime uses UTC rather than local time, as mentioned in my introduction.
If a change to the system clock is unavoidable, a good precaution against the risk of duplicate identifiers mentioned above is to end the queue manager during the period when time will be “repeated” on the server. For example, if you need to move the clock back an hour, then end the queue manager for an hour. In this way, the queue manager should not experience any duplicate timestamps.
It should be possible to run WebSphere MQ on servers using NTP (Network Time Protocol) to keep time synchronized across multiple machines.
The potential for generating duplicate identifiers discussed above can be lessened through the configuration of NTP to favour slewing rather than stepping the system clock. In this way, if the system clock is ahead of the correct time, it will be slowed down to allow the correct time to “catch up” rather than stepping the clock back to the correct time immediately. In this way, queue managers on that server may be less likely to encounter duplicate timestamps.
NTP adjustments are typically minor enough that the potential for the other time difference/interval-related issues discussed here are not noticeable.
Implications for non-distributed platforms
Please note that I am referring to distributed platforms in the discussion in this post.
It is worth highlighting that, unlike distributed platforms, there are in fact specific problems when making time changes on iSeries systems running WebSphere MQ due to the use of the journal. This is explained in the WebSphere MQ System Administration Guide for iSeries.
In this post, I have outlined a number of implications of changes to the time on a system with WebSphere MQ running. This should not be taken as a definitive list of all possible implications, and there may be other issues that I have not considered.
With one exception (potential problems in WebSphere MQ Publish/Subscribe in the event of duplicate MsgId values), none of these implications are issues which would cause errors or problems in the queue manager operation. And most of the issues can be resolved by restarting the queue manager or channels in use.
The implications outlined generally issues which may cause confusion in the logic of applications connecting to WebSphere MQ if the unexpected behaviour is not handled correctly. For example, a change to system time which causes messages to be got later than intended is not going to cause errors within the queue manager, but may cause problems for the application concerned.
As such, it is worth considering the design of your WebSphere MQ applications in the light of such implications if significant time changes are required on production servers.