This paper introduces design and basic performance of the DSM(distributed shared memory) system on a cluster of clusters. Networking devices such as Myrinet have improved the performance of cluster systems significantly. In addition to that, such kind of network devices introduced a new hierarchical architecture; a multi-cluster, a cluster of high-performance clusters. To ease the difficulty of programming with message passing, which is the conventional programming paradigm on cluster systems, many DSM (distributed shared memory) systems have been developed in recent years. However, there have been no DSM systems developed on multi-clusters. The DSM system consists of a runtime system to support basic functions for accessing virtual shared memory built on such environment. The functions are allocation of global data, read and write accesses to global data, synchronization of the whole system, and mutual exclusion. The authors have evaluated the performance of the runtime system, built on a SMP cluster, COMPaS, at RWCP(Real World Computing Partnership) in Tsukuba, Japan. The result shows that a read access to remote memory on the same cluster costs about 0.2msec, while a read access to remote memory on other cluster costs about 1.3msec. The execution time of LU decomposition on a multi-cluster consisting two clusters of three PCs is about 2.8times faster than the time on one PC.