TY - JOUR
T1 - Secure wavelet matrix
T2 - Alphabet-friendly privacy-preserving string search for bioinformatics
AU - Sudo, Hiroki
AU - Jimbo, Masanobu
AU - Nuida, Koji
AU - Shimizu, Kana
N1 - Publisher Copyright:
© 2019 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2019/9
Y1 - 2019/9
N2 - Biomedical data often includes personal information, and the technology is demanded that enables the searching of such sensitive data while protecting privacy. We consider a case in which a server has a text database and a user searches the database to find substring matches. The user wants to conceal his/her query and the server wants to conceal the database except for the search results. The previous approach for this problem is based on a linear-time algorithm in terms of alphabet size $\mathbf{|\Sigma |}$||, and it cannot search on the database of large alphabet such as biomedical documents. We present a novel algorithm that can search a string in logarithmic time of $\mathbf{|\Sigma |}$||. In our algorithm, named secure wavelet matrix sWM, we use an additively homomorphic encryption to build an efficient data structure called a wavelet matrix. In an experiment using a simulated string of length 10,000 whose alphabet size ranges from 4 to 1024, the run time of the sWM was up to around two orders of magnitude faster than that of the previous method. sWM enables the searching of a private database efficiently and thus it will facilitate utilizing sensitive biomedical information.
AB - Biomedical data often includes personal information, and the technology is demanded that enables the searching of such sensitive data while protecting privacy. We consider a case in which a server has a text database and a user searches the database to find substring matches. The user wants to conceal his/her query and the server wants to conceal the database except for the search results. The previous approach for this problem is based on a linear-time algorithm in terms of alphabet size $\mathbf{|\Sigma |}$||, and it cannot search on the database of large alphabet such as biomedical documents. We present a novel algorithm that can search a string in logarithmic time of $\mathbf{|\Sigma |}$||. In our algorithm, named secure wavelet matrix sWM, we use an additively homomorphic encryption to build an efficient data structure called a wavelet matrix. In an experiment using a simulated string of length 10,000 whose alphabet size ranges from 4 to 1024, the run time of the sWM was up to around two orders of magnitude faster than that of the previous method. sWM enables the searching of a private database efficiently and thus it will facilitate utilizing sensitive biomedical information.
UR - http://www.scopus.com/inward/record.url?scp=85043457399&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043457399&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2018.2814039
DO - 10.1109/TCBB.2018.2814039
M3 - Article
C2 - 29994070
AN - SCOPUS:85043457399
SN - 1545-5963
VL - 16
SP - 1675
EP - 1684
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 5
M1 - 3370684
ER -