TY - JOUR
T1 - Approximate spectral decomposition of Fisher information matrix for simple ReLU networks
AU - Takeishi, Yoshinari
AU - Iida, Masazumi
AU - Takeuchi, Jun'ichi
N1 - Publisher Copyright:
© 2023 The Authors
PY - 2023/7
Y1 - 2023/7
N2 - We argue the Fisher information matrix (FIM) of one hidden layer networks with the ReLU activation function. For a network, let W denote the d×p weight matrix from the d-dimensional input to the hidden layer consisting of p neurons, and v the p-dimensional weight vector from the hidden layer to the scalar output. We focus on the FIM of v, which we denote as I. Under certain conditions, we characterize the first three clusters of eigenvalues and eigenvectors of the FIM. Specifically, we show that the following approximately holds. (1) Since I is non-negative owing to the ReLU, the first eigenvalue is the Perron–Frobenius eigenvalue. (2) For the cluster of the next maximum values, the eigenspace is spanned by the row vectors of W. (3) The direct sum of the eigenspace of the first eigenvalue and that of the third cluster is spanned by the set of all the vectors obtained as the Hadamard product of any pair of the row vectors of W. We confirmed by numerical calculation that the above is approximately correct when the number of hidden nodes is about 10000.
AB - We argue the Fisher information matrix (FIM) of one hidden layer networks with the ReLU activation function. For a network, let W denote the d×p weight matrix from the d-dimensional input to the hidden layer consisting of p neurons, and v the p-dimensional weight vector from the hidden layer to the scalar output. We focus on the FIM of v, which we denote as I. Under certain conditions, we characterize the first three clusters of eigenvalues and eigenvectors of the FIM. Specifically, we show that the following approximately holds. (1) Since I is non-negative owing to the ReLU, the first eigenvalue is the Perron–Frobenius eigenvalue. (2) For the cluster of the next maximum values, the eigenspace is spanned by the row vectors of W. (3) The direct sum of the eigenspace of the first eigenvalue and that of the third cluster is spanned by the set of all the vectors obtained as the Hadamard product of any pair of the row vectors of W. We confirmed by numerical calculation that the above is approximately correct when the number of hidden nodes is about 10000.
UR - http://www.scopus.com/inward/record.url?scp=85160542772&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160542772&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2023.05.017
DO - 10.1016/j.neunet.2023.05.017
M3 - Article
C2 - 37262931
AN - SCOPUS:85160542772
SN - 0893-6080
VL - 164
SP - 691
EP - 706
JO - Neural Networks
JF - Neural Networks
ER -