One of the topics which was requested a couple of times last week was about the information in FFST files.

In this post, I will try to give an introduction to these files, and identify some actions which can be taken in the event of an FFST file being generated.

What is FFST?

FFST stands for First Failure Support Technology, and is technology within WebSphere MQ designed to create detailed reports for IBM Service with information about the current state of a part of a queue manager together with historical data.

What are they for?

They are used to report unexpected events or states encountered by WebSphere MQ. (Alternatively, they can be generated upon request).

Note that return codes are used for application programmers to inform them of expected states or errors in a WebSphere MQ application. There are exceptions to this rule, but as a rule of thumb, FFSTs are used to report something that will need to be actioned by:

  • system administrators – such as where FFSTs report resource issues such as running low on disk space
  • IBM – where FFSTs report a potential code error in WebSphere MQ that (unless already identified and corrected in existing maintenance) may need correcting

Where are they?

On Windows, they are typically written to C:\Program Files\IBM\WebSphere MQ\errors
On UNIX systems, they are written to /var/mqm/errors

They are contained in files with the extension .FDC

The file name will begin with AMQ followed by the process id for the process which reported the error.

e.g.
/var/mqm/errors/AMQ21449.0.FDC – is the first FFST file produced by a process with ID 21449

What do they contain?

FFST files are text files containing error reports for a single process.

If a single process produces more than one error report, these are all included within the same FFST file for that process, in the order in which they were generated.

How should I look at these files?

FFST files are just text files, so your favourite text editor is normally the best place to start.

The tool ffstsummary is also useful – it produces a summary of FFST reports in the current directory, sorted into time order. This can be a good place to start to see the errors reported in your errors directory.

For example:

[dalelane@dlane ~]$ cd /var/mqm/errors
[dalelane@dlane errors]$ ffstsummary
AMQ21433.0.FDC 2007/04/10 10:05:45 amqzdmaa 21433 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21429.0.FDC 2007/04/10 10:05:45 amqzmur0 21429 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21469.0.FDC 2007/04/10 10:05:45 runmqlsr 21469 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21422.0.FDC 2007/04/10 10:05:45 amqzfuma 21422 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21424.0.FDC 2007/04/10 10:05:45 amqzmuc0 21424 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21431.0.FDC 2007/04/10 10:05:45 amqrrmfa 21431 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21449.0.FDC 2007/04/10 10:05:45 amqzlaa0 21449 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21434.0.FDC 2007/04/10 10:05:45 amqzmgr0 21434 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21452.0.FDC 2007/04/10 10:05:45 runmqchi 21452 2 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
AMQ21417.0.FDC 2007/04/10 10:05:45 amqzxma0 21417 4 XC338001 xehAsySignalHandler xecE_W_UNEXPECTED_ASYNC_SIGNAL OK
[dalelane@dlane errors]$

The columns in the output above show:

  • filename – which FDC file contains the FFST report
  • time and date of the report
  • process name – name of the process which produced the report
  • process and thread ids – for the process which produced the report
  • probe id – which I will talk about more later
  • component – part of WebSphere MQ where the report was produced
  • error code – major errorcode and minor code

What does an FFST report contain?

I generated the following FFST by pressing Ctrl-C to interrupt a channel listener (runmqlsr) process which I kicked off from a command shell. This is a nice way to generate an FFST on UNIX systems for you to have a look at (although you can manually generate an FFST from any WebSphere MQ process).

I’ve added some numbers on the left to mark out points worth noting…


        +-----------------------------------------------------------------------------+
        |                                                                             |
        | WebSphere MQ First Failure Symptom Report                                   |
        | =========================================                                   |
        |                                                                             |
(1)     | Date/Time         :- Wednesday May 02 13:25:56 BST 2007                     |
(2)     | Host Name         :- jolene.hursley.ibm.com (Linux 2.6.9-42.0.10.EL)        |
        | PIDS              :- 5724H7207                                              |
(3)     | LVLS              :- 6.0.2.0                                                |
        | Product Long Name :- WebSphere MQ for Linux (POWER platform)                |
        | Vendor            :- IBM                                                    |
(4)     | Probe Id          :- XC338001                                               |
        | Application Name  :- MQM                                                    |
(5)     | Component         :- xehAsySignalHandler                                    |
(6)     | SCCS Info         :- lib/cs/unix/amqxerrx.c, 1.214.1.4                      |
        | Line Number       :- 737                                                    |
        | Build Date        :- Sep 21 2006                                            |
        | CMVC level        :- p600-200-060921                                        |
        | Build Type        :- IKAP - (Production)                                    |
(7)     | UserID            :- 00011243 (root)                                        |
(8)     | Program Name      :- runmqlsr                                               |
        | Addressing mode   :- 64-bit                                                 |
(9)     | Process           :- 16337                                                  |
        | Thread-Process    :- 16337                                                  |
(10)    | Thread            :- 2                                                      |
        | ThreadingModel    :- PosixThreads                                           |
(11)    | Major Errorcode   :- xecE_W_UNEXPECTED_ASYNC_SIGNAL                         |
        | Minor Errorcode   :- OK                                                     |
        | Probe Type        :- MSGAMQ6209                                             |
        | Probe Severity    :- 3                                                      |
(12)    | Probe Description :- AMQ6209: An unexpected asynchronous signal (2 :        |
        |   SIGINT) has been received and ignored.                                    |
        | FDCSequenceNumber :- 0                                                      |
        | Arith1            :- 2 2                                                    |
(13)    | Comment1          :- SIGINT                                                 |
        | Comment2          :- Signal sent by pid 0                                   |
        |                                                                             |
        +-----------------------------------------------------------------------------+
        
(14)    MQM Function Stack
        xehAsySignalMonitor
        xehHandleAsySignal
        xcsFFST
        
(15)    MQM Trace History
        { xppInitialiseDestructorRegistrations
        } xppInitialiseDestructorRegistrations rc=OK
        { xehAsySignalMonitor
        -{ xcsGetEnvironmentInteger
        --{ xcsGetEnvironmentString
        --} xcsGetEnvironmentString rc=xecE_E_ENV_VAR_NOT_FOUND

(16)    Process Control Block
        0x80006ad890   58494850  000029E8  00003FD1  00000004    XIHP..)...?.....
        0x80006ad8a0   00000000  10029F70  00000000  10033A50    .......p......:P
        0x80006ad8b0   00000000  00000000  00000000  00000000    ................
        0x80006ad8c0 to 0x80006ad900 suppressed, 5 lines same as above
        0x80006ad910   00000000  00000001  00000000  00000000    ................
        0x80006ad920   00000000  00000000  00000000  00000000    ................
        0x80006ad930 to 0x80006ad9d0 suppressed, 11 lines same as above
        0x80006ad9e0   00000000  00000000  00000001  00568001    .............V..
        0x80006ad9f0   00FB8000  00000000  00000080  00760000    .............v..
        0x80006ada00   00000000  00000000  00000000  00000000    ................
        0x80006ada10 to 0x80006ae9f0 suppressed, 255 lines same as above
        0x80006aea00   00000000  FFFFFFFF  FFFFFFFF  00000000    ................
        0x80006aea10   00000000  00000000  00000001  FFFFFFFE    ................
        0x80006aea20   00000001  00000000  00000000  00000000    ................
        0x80006aea30   00000080  0069A380  00000000  00000000    .....i..........
        0x80006aea40   00000000  00000000  00000000  00000000    ................

etc

(17)    Environment Variables:
        MANPATH=/opt/csm/man:
        HOSTNAME=jolene.hursley.ibm.com
        TERM=xterm
        SHELL=/bin/bash
        HISTSIZE=1000
        SSH_CLIENT=::ffff:9.20.94.90 2625 22
        QTDIR=/usr/lib/qt-3.3
        OLDPWD=/root
        SSH_TTY=/dev/pts/1
        USER=root
        LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40...
        KDEDIR=/usr
        MAIL=/var/spool/mail/root
        PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/opt/csm/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:...
        INPUTRC=/etc/inputrc
        PWD=/var/mqm/errors
        LANG=en_GB.UTF-8
        SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
        SHLVL=1
        HOME=/root
        LOGNAME=root
        SSH_CONNECTION=::ffff:9.20.94.90 2625 ::ffff:9.20.63.20 22
        LESSOPEN=|/usr/bin/lesspipe.sh %s
        G_BROKEN_FILENAMES=1
        _=/usr/bin/runmqlsr         
  1. date and time that this report was produced
    For many problems, this is the most useful piece of information – allowing an error report to be correlated with other known events.
  2. hostname for the machine where this report was produced
  3. version and maintenance level for WebSphere MQ
    This is useful when comparing an error report against a documented known problem.
  4. probe ID
    This is an internal method of identifying the error report. It identifies a single point in the WebSphere MQ source code where the report was produced (consisting of two letters giving a component code, a three digit function code, and a three digit probe identifier).
    This often makes it the best way to uniquely identify the error that the report is describing. More on this a bit later…
  5. component
    This is the bit of WebSphere MQ which produced the report. As with the source information below, it is generally more useful to us than it is to users, although the name can sometimes give a useful hint as to the nature of the error report. For example, in this case where the report is the result of my using Control-C to generate an interrupt signal, you can see that the component which produced the report was a signal handler.
  6. source information
    Although this isn’t information isn’t useful to users, I thought it might be interesting to highlight that an FFST will identify exactly where it was produced, down to the source code file, line number and version
  7. user id that was running the process which produced the report
    This is useful to confirm whether a problem was the result of insufficient user privileges.
  8. process name of process which produced the report
  9. process id for the process which produced the report
  10. thread id for the process which produced the report
  11. error codes for the report
  12. a longer description of the error code for the report
    This is a textual (English) description containing information that a WebSphere MQ developer thought might be helpful if the situation were to occur. Sometimes this information may be useful to users, such as messages identifying an operating system function which has failed and what the error code was. Other times, it will only useful to IBM Service.
  13. additional comment information
  14. function stack for the process at the time of the report
  15. a history of function calls made by the process leading up to the report
  16. a series of dumps
    In the WebSphere MQ source code, functions can register data items that may be of interest. If it has something that could be useful (such as in diagnosing or debugging a problem), it can register it with the engine that produces FFST reports. This means that in the event of an FFST being produced, this data will be included. These items are deregistered when a function completes.
    This is normally of more use to IBM Service than users, however there may be times – such as when some message data is included – when you will recognise some of the data here.
  17. environment variables for the the environment of the process which produced the report

What can I do if I have an FFST report?

Monitoring for the production of FDC files is an important part of handling the occurrence of errors in a WebSphere MQ system. Prompt handling of a problem can be key to a timely resolution.

If an FDC file is created, the next step is probably to determine if this is something that requires you to take an action, and if so how urgent is it. A number of factors will influence this, including:

  • Are queue managers running?
  • Are applications still working?
  • Does the probe description give any insight into why the FFST was generated?
  • Does the time and date of the FFST correspond with any other known events or occurences at the same time which may explain the error?

If the FFST identifies a resource issue, such as low diskspace, then this will normally give enough information for a system administrator to identify and correct the source of the problem.

If you are unable to determine an explanation for the FFST, then a useful next step is to look to see if others have seen this FFST before, and if so what they found it to mean and needed to do.

This is where the probe id from the FFST is very useful. In the majority of cases (for one notable exception, see my discussion on signals below), this will be a unique eye-catcher for the issue being reported. This means that you can search for this short string on the WebSphere MQ support site on ibm.com or in the IBM Support Assistant. Often, this will reveal cases where someone has encountered this FFST before and the fix that resulted.

Beyond this point, you will most likely need to raise a PMR with IBM Service. It is useful to send all FFSTs from your system (rather than just the one that you believe to be of interest), as following the history can be key to resolving an issue. It is also useful to send the WebSphere MQ system (/var/mqm/errors/AMQ*.LOG) and queue manager (/var/mqm/qmgrs/errors/AMQ*.LOG) error logs, together with a clear description of what you are seeing and the impact on the system and your business.

The MustGather technotes provide more detailed instructions about the diagnostic information that will be useful for the type of problem you are encountering.

Signal handling

I wrote above that I generally find the probe id to be a unique identifier for a specific problem. While this is usually true, one notable exception are FFSTs produced by the signal and exception handlers.

The signal handler component produces FFSTs to report signals sent to WebSphere MQ processes. This means that the information in the FFST (such as the probe id and source code file, line number, etc.) is about the signal handler which caught the signal, not whatever it was that caused or created the signal.

This is less of a problem if the signal was generated externally to WebSphere MQ, such as the SIGINT that I generated with Ctrl-C in the example above. The FFST contains information about the process which was sent the signal and the time and date of the signal.

It can be more complex if the signal is generated from elsewhere within WebSphere MQ, such as a SIGSEGV from a segmentation fault in another WebSphere MQ process. The exception handler will generate an FFST to record the SIGSEGV, however it is important to bear in mind that any such FFST contains a report about where the SIGSEGV was caught, not where it was generated. This doesn’t mean that the cause cannot be found, but it does mean that the FFST information such as the probe id is not necessarily the sort of unique eye-catcher described above.

Generating FFSTs on request

I mentioned above that it is possible to generate FFSTs manually. This can be done using the following commands:

amqldbgn -p PID (on Windows)
or
kill -USR2 PID (on UNIX platforms)

where PID is the process ID for a WebSphere MQ process. (Note that I’ve talked before about how to get the process IDs for WebSphere MQ processes.)

FFST reports generated in this way will have a probe id that ends in 255.

Finally

While the description I’ve given here is true at the moment, the format may change in future releases. It’s worth bearing this in mind if you decide to automate or script any steps based on the current format as described here.

Advertisements