PSI also known as private set intersection, is a special problem in secure computation that allows two parties with distrust respectively holding two private sets to compute their intersection, as shown in Figure 1. Formally speaking, let A B denote two parties, their corresponding input is X Y separately. The PSI technique is to compute $X \cap Y$ without revealing additional information to anyone except the intersection itself. The motivation of the PSI is fairly straightforward, it is from several practical applications that require PSI functionality.
- Advertising optimization: advertisers such as Google Facebook and merchants are driven to compute conversion rates that benefit both of them. But hindered by privacy law they cannot simply share data and compute ad conversion rates.
- Genome matching: genetic test services for diagnosis assistant such as Wegene and 23andme have become very popular. Recently the privacy of genome data is drawing great attention as the particularity of genome data. First once leaking genome data leaking them forever, because it is impossible to change them like changing your password. Second is that encryption schemes in the foreseeing future can not be secure for a life long time. So privacy-preserving schemes are ideal solutions while enabling users to keep control of their data.
- privacy contact discovery. In social networks when signing up for a new user it is often essential to identify current registered users who are also contacts of the new user.
Of course, there are variants due to the different privacy goals of practical applications. For example, there may be a situation that instead of only one of them can get the output both of the parties are capable to get the intersection. Another one is that the output of the protocol is the size of the intersection $|X \cap Y|$ rather than the elements of $X \cap Y$
So far we see the most naive way to compute PSI which sharing their data to others totally breaks the privacy guarantee of PSI. Instead of directly sharing plaintext, the less naive way is to sharing hash of the each entry in the data. Intuitively this solution works well and we see many real-world applications adapt it. The key issue is that under some circumstances users are capable to guess the plaintext, such as when the space of input is relatively small(IP addresses), brute-forcing can easily comprise the privacy of the whole protocol. It’s clear that the naive way to compute PSI cannot meet the PSI privacy definition. the academic community has proposed four solutions to this problem, we will explore the classial ideas of them in the following.
- Public-key cryptography-based: under DDH assumption$(g^a,g^b,g^{ab})\approx(g^a,g^b,g^c)$, where $r\stackrel{R}{\leftarrow}\mathbb{Z}_q,\approx$ means the probability distributions of the two tuples are computationally indistinguishable, similar to hash-based solution, PSI problem can be solved by finding couples where $X^{ab}\cap Y^{ab} mod\;p$ .
- Circuit-base solution: secure multi-party computation is the generic solution in jointly computing $f(x,y)$ between parties while preserving the privacy of their inputs. Any computable function can be securely evaluated in the circuit representation of the function.PSI can be seen as a special f in MPC context, thus garbled circuit or arithmetic circuit is the solution to PSI.
- OT-based solution. An oblivious transfer protocol that one party with his private choices and the other party input a group of data of two ensures that the one choose his output and the others one is certain that the one receives one element in a group. Say $A(x_{i,0},x_{i,1}),B(b_i)$ . After OT B receives $x_{i,b}$, while A knows that B gets a $x$ but don’t know which one. The key idea to do PSI in OT is to set the choice bit to 1 when in $x_i$ th OT. A concrete example is that suppose $X={1,3,5,6},Y={2,3,5,7}$ . B generate random OT input groups $(r_{i,0},r_{i,1})$ , and A set his choice bit to 1 in $X_i$ th OT. After OT B send $r_{y_i,1}$ to A, based on that A can easily compute the result of PSI.
In this short article, I reviewed what is PSI and why we study PSI problem. After that we give a high level and simple enough description of three fundamental solutions to this problem. In the next article, I will cover more details in these solutions and discuss their pros and cons.