# AN EXPERIENCE ON FIXING PROBLEM ON VMEBUS MODULES

A. Akiyama, T. Katoh, K. Kudo, Y. Mori, T. T. Nakamura, J-I. Odagiri, M. Tejima, N. Yamamoto, KEK, Tsukuba, Ibaraki 305-0801, Japan

### Abstract

It is very difficult to fix the hardware errors happen in the complicated system where a common bus and various kinds of modules are used. In the KEKB control computer system, it happened when we put a VME-MXI driver module and a CPU module with Power PC in a subrack of the beam monitor IOC. It took us some time to identify the true source of this problem: a malfunctioning VME-MXI module in the IOC and fix it.

The CPU in the beam monitor IOC stopped as often as once in a day or less depending on the configuration of the VME subrack, e.g. number of modules, position of each module, and so on. We have 20 such computers and the frequency of the CPU halts was too much for stable operation of KEKB accelerators. We observed the noises on the signals on the VME backplane when the VME-MXI driver module was accessed. We first tried to make the shape of the waveform better and decrease the number of system halts. For that purpose, we put additional load on to the bus by putting a bus extender module into a slot. It gave us a preferable decrease of system halts but was not the final solution. Then we started to analyze bus signal carefully and found abnormal bus cycles happened when the CPU module requested a write bus cycle to the VME-MXI module and the CPU module did not complete the bus cycle. We reported the fact to the manufacturer and in reply, they sent us patch information about the module. Since we put the patch on all the VME-MXI modules we have, we have not observed any halt. The process of this experience will be described.

#### **1 INTRODUCTION**

The control computer system for KEKB Accelerators [1] has a three-layer architecture called as "standard model" and is popular in accelerator control systems. In KEKB accelerator control system, VMEbus computers are used as equipment control layer, the middle layer, for easiness of upgrading, and alternate sourcing, and for reliability, availability and serviceability reasons. EPICS [2] was adopted as the software toolkit for KEKB accelerator control system and the computer in the middle layer is called as an "IOC(Input/Output Controller)". Since VMEbus was proposed about 20 years ago as an international standard bus for microprocessors, many compatible boards have been developed and sold in the world. Some modules have been already obsolete and are not supported by the manufacturer, however, the bus itself is still used widely because its nature of a good standard. There often happen problems arisen from the combination of several modules produced by different vendors or even by the same manufacturer.

The ordinary system configuration of the IOC for beam position monitoring system in the KEKB control system is schematically shown in Fig. 1. As the CPU module, we have been using FORCE CPU-40 (MC68040), CPU-64(MC68060), FORCE PowerCore 6603 (PPC 603e) and now, FORCE PowerCore 6750 (266 MHz and 400 MHz PPC 750). We use VXI modules made by Hewlett-Packard Japan for switching and detecting beam position monitor signals. Between VME subrack and VXI mainframes, we are using MXIbus driven by a VME-MXI module originally produced by National Instruments.

The VXI modules were developed and tested at the manufacturer by using HP9000/V743 system and VEE software. At KEK, we developed and tested driver software for EPICS by using FORCE CPU-40 and then driver software was converted from CPU-40 to PowerCore 6603 and 6750. The transition seemed to be done smoothly and well.

#### 2 PROBLEMS HAPPENED

#### 2.1 Sudden CPU Halts

During installation, while we were fixing initial hardware and software problems as in the usual installation, IOCs for beam monitors were found stopped once in a few days without any signs or messages. But the hardware indicator showed the CPUs were accessing VXI modules. We tested some combinations of CPU modules and VME-MXI module. For CPU-40 and CPU-60, they stopped with "BUS ERROR" light on but PowerCore 6603 and PowerCore 6750 stopped without any sign.

The accumulated numbers of rebooting were shown in Fig. 2. The inclination of the plotted curve shows the rate of CPU halts. The number may include number of rebooting due to software development but is small compared with the number of CPU halts.



NI: National Instruments HP: Hewlett-Packard(Present Agilent Technology) HP43591A: FFT Module HP43592A: Multiplexer Module

Fig. 1. Basic system configuration of a BM IOC.

## 2.2 For the Commissioning of KEKB

The first step was to know well about what were happening in the VME subracks when the IOCs halted. We used a VMEbus analyzer, oscilloscopes, and a VMEbus extender to put probes to the module. We waited but the frequency of the events got low due to the addition of the module and some IOCs never stopped again. There was not enough time to fix the problem completely before commissioning of the KEKB accelerators and we decided to put bus extender into all the related VME subracks. For some important IOCs, we put bus analyzers into their slots. Since then, the frequency of the IOC halts was decreased to once a week or less. It is also shown in Fig. 2 as the sudden decrease of the inclination of the plotted lines. Sudden increases of the inclination last December was caused by the increase of sampling frequencies of the beam positions.

#### 2.3 Detailed Tests

It was found that the CPU stopped when it accessed to the VME-MXI module in write mode when we tested later. Then we could get higher frequency by increasing the number of write accesses for the traps at the test bench. We caught an unjust signal when the PowerCore 6750 accessed the VME-MXI module in write mode by using an oscilloscope. The CPU module



Fig. 2: Accumulated number of CPU reboots of BM IOC from 1999/01/14 through 2001/04/03.

does not release the usage of VMEbus due to unjust DTACK\* signal. Examples of the normal and unjust DTACK\* signal are shown in Fig. 3. In Fig. 3, normal(left) waveform has not large ringing but abnormal(right) waveform has a sharp and deep one.



Fig.3. Examples of normal(left) and abnormal(right) DTACK\* signals.

#### 2.4 The Cause of Bus Locking

When a VMEbus write cycle is initiated by the CPU module, AS\*, WRITE\* and other signals are asserted on the VMEbus and sent to the VME-MXI module, which repeats these signals and sends through the MXIbus to the VXI-MXI controller in the VXI mainframe, and finally to the VXI module. The selected VXI module responds to the AS\* and WRITE\* by sending DTACK\* back to the VXI-MXI module and to the VME-MXI module. We observed the DTACK\* signal on the VXI mainframe was normal and very clean. And more, other modules than VME-MXI module did not send such unjust DTACK\* signal as VME-MXI module. Therefore, the VME-MXI module was pointed as the source of the unjust DTACK\* signal.

#### **3 CPU BOARD RESPONSES**

All the CPU boards use VMEbus as the external bus but the local buses inside the CPU boards are different. CPU-40 uses 68040 bus and CPU-6750 uses PCI bus as the local bus. They all have bus conversion mechanisms with custom chips to bridge internal and external buses. The CPU-40 uses FGA-002(FORCE) chip, CPU-64 uses CIX64, CPU-6603 uses Universe and CPU-6750 uses Universe II or Universe IIB(Tundra) chip. CPU-40 and CPU-64 detect "BUS ERROR". CPU-6603 stops as CPU-6750 but less frequently than CPU-6750. The responses to the unjust DTACK\* signal are different depending on the CPU and bus-bridge chips. But it is very clear that the DTACK\* signal driven by VME-MXI module is unjust.

### **4 THE SOLUTION**

We examined the circuit carefully following the circuit diagram obtained from the manufacturer and sent them more detailed information. In response to it, National Instruments pointed out that there happens a "Setup-Time" problem and the module returns unjust DTACK\* signal, and finally, they sent us a patch information. After we put a patch on the board to latch the original DTACK\* signal again by the unused D-type Flip-Flop in an FPGA the module started working perfectly with beautiful DTACK\* signal without the dip. After applying same patches, IOCs have never stopped again.

## **5 CONCLUSION**

It is the best way to make conformance tests by using all the modules that will be used in the actual system from the development phase. But it is quite natural that we tend to use the latest version of electronic devices and exchange components with the latest ones to get higher performance after several years. In some cases, we have to upgrade the system due to the new requirements. There happens that old modules become obsolete even if you use standard bus modules. If the modules that should be exchanged are inexpensive, it may be bought and replaced by new ones, but if they are expensive, we have to keep using old ones.

It is very important to be careful to use new modules and it is recommended to have a standard test bench and standard procedure to make conformance test.

#### ACKNOWLEDGMENT

Authors wish to thank members of the KEKB accelerator team for their endurable understanding, advices and help.

#### REFERENCES

- A. Akiyama, et al., "KEKB CONTROL SYSTEM: THE PRESENT AND THE FUTURE", Proceedings of the 1999 Particle Accelerator Conference, New York, 1999, pp.343-345
- [2] L. Dalesio et al.: The Experimental Physics and Industrial Control System Architecture: Past, Present, and Future, Proc. ICALEPCS, Berlin, Germany, 1993, pp.179-184