TY - GEN

T1 - Gossip algorithms for clustering problems

AU - Nguyen, Linh Thi Hoai

AU - Wada, Takayuki

AU - Masubuchi, Izumi

AU - Asai, Toru

AU - Fujisaki, Yasumasa

N1 - Funding Information:
This work was supported by CREST, Japan Science and Technology Agency.
Publisher Copyright:
© 2015 IEEE.

PY - 2015

Y1 - 2015

N2 - This paper presents several algorithms for multi-dimensional data clustering. The main difference of the algorithms introduced in this paper comparing to known ones is that the algorithms are multi-agent based and gossip. Hence, the number of computation in each step is not too large even for large data and the computational burden can be distributed. The basic algorithm is described as follows. Each data object is assigned to an agent in a network and its attributes constitute to a vector which is considered as the initial state of the agent. A common confidence threshold is set for all of the agents. The states of agents in the network will be updated time by time according to an iterative procedure: At each time, (i) two arbitrary agents are chosen randomly, (ii) they exchange their states, and (iii) if the distance between their states does not exceed the confidence threshold, they update their states as the average of the two. Under a certain condition, this algorithm converges almost surely in an equilibrium point such that any two agents either have the same final state or have distinct states which differ more than the confidence threshold from each other. In this way, the given data objects are partitioned into groups such that ones in the same group correspond to the agents with the same final state. Some modified algorithms for improving the performance of clustering and/or getting some prescribed condition are also introduced.

AB - This paper presents several algorithms for multi-dimensional data clustering. The main difference of the algorithms introduced in this paper comparing to known ones is that the algorithms are multi-agent based and gossip. Hence, the number of computation in each step is not too large even for large data and the computational burden can be distributed. The basic algorithm is described as follows. Each data object is assigned to an agent in a network and its attributes constitute to a vector which is considered as the initial state of the agent. A common confidence threshold is set for all of the agents. The states of agents in the network will be updated time by time according to an iterative procedure: At each time, (i) two arbitrary agents are chosen randomly, (ii) they exchange their states, and (iii) if the distance between their states does not exceed the confidence threshold, they update their states as the average of the two. Under a certain condition, this algorithm converges almost surely in an equilibrium point such that any two agents either have the same final state or have distinct states which differ more than the confidence threshold from each other. In this way, the given data objects are partitioned into groups such that ones in the same group correspond to the agents with the same final state. Some modified algorithms for improving the performance of clustering and/or getting some prescribed condition are also introduced.

UR - http://www.scopus.com/inward/record.url?scp=84973169023&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973169023&partnerID=8YFLogxK

U2 - 10.1109/IECON.2015.7392163

DO - 10.1109/IECON.2015.7392163

M3 - Conference contribution

AN - SCOPUS:84973169023

T3 - IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society

SP - 589

EP - 594

BT - IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 41st Annual Conference of the IEEE Industrial Electronics Society, IECON 2015

Y2 - 9 November 2015 through 12 November 2015

ER -