Table of Contents
References: The Knowledge Complexity of Interactive Proof-Systems (1985) Shafi Goldwasser, Silvio Micali and Charles Rackoff
Interaction in proof systems can drastically reduce the knowledge shared, allowing for proofs that reveal nothing beyond the truth of the claim being proved. A prover can convince a verifier of the truth of a statement by answering the verifier's questions, without revealing any extra information beyond the validity of the statement itself.
References: Delegating Computation: Interactive Proofs for Muggles (2008) Shafi Goldwasser, Yael Tauman Kalai and Guy N. Rothblum (GKR Protocol)
Introduces efficient interactive proof systems that enable a computationally weak verifier to check the correctness of general complex computations performed by an untrusted powerful prover.
1. Interactive Proof Systems
1.1. The Limitation of NP Proof-System
Let's recall Stephen Cook and Leonid Levin's influential model definition of NP: The Cook-Levin Theorem The NP proof-system consists of two communicating deterministic Turing machines A and B: respectively, the prover and the verifier. Where the prover is exponential-time, the verifier is polynomial-time. They read a common input and interact in a very elementary way. On input a string \( x \) belonging to an NP language L, A computes a string \( y \) (whose length is bounded by a polynomial in the length of \( x \) ) and writes \( y \) on a special tape that B can read. B then checks that \( f_l(y) = x \) (where \(f_l\) is a polynomial-time computable function) and, if so, halts and accepts.
1.1.1. \( f_l(y) = x \)
We can understand \( f_l(y) = x \) in this way: "The output (certificate) \( y \) belongs to the input x, where \( f_l() \) is a function that can check that \( y \) in poly-time."
1.1.2. Formalization vs. Intuition
Sometimes formalization cannot entirely capture the inituitive notions. In the context of theorem-proving: NP captures a specific form of proving a theorem, where a proof can be "written down" and verified without interacting with the prover. The certificates, which is like a formal written proof, and the verifier just passively checks it. This is like reading a proof in a book. Once you have the book, there is no back-and-forth to clarify or ask questions about the proof.
1.1.3. Example: Co-HAMPTATH Problem
We do not know whether the complement of HAMPATH (Co-HAMPATH) is NP
y is easy to be verified: HAMPATH
For the Hamiltonian Path (HAMPATH) problem, given a solution (i.e., a path), it's easy for a verifier to check it in polynomial time.
The prover can just present the path (certificates, or \( y \)), and the verifier checks whether it's a valid Hamiltonian path (i.e., visits each vertex exactly once and satisfies the graph's edges).
y is hard to be verified: Co-HAMPATH
The Co-HAMPATH problem asks whether a graph does not have a Hamiltonian path. Here, proving the non-existence of something becomes far more complex.
If you ask a prover to convince you that no Hamiltonian path exists, the proof isn't as simple as just pointing to something (like a path). Instead, you'd need to somehow verify all possible paths don't work, which could take exponential time.
1.1.4. Limitation of the NP Proof-System
In the NP model, some problems in NP (like HAMPATH) are easily verifiable, but NP does not capture the complexity of some other problems (often their complements i.e., Co-NP problems).
That's why problems like Co-HAMPATH are much harder to verify using the static NP model: In our example, there's no easy way for a prover to present a simple "proof" that no Hamiltonian path exists, and for the verifier to check it efficiently.
1.2. The Interactive Proof Systems
In the paper "The Knowledge Complexity of Interactive Proof-Systems" by Safi Goldwasser, Slivio Micali and Charles Rackoff in 1985. They introduce an interactive proof-systems to capture a more general way of communicating a proof.
Trust the "randomness". Trust is placed not in specific random values but in the statistical behavior over many trials.
Much like in computation, BPP (Bounded-error Probabilistic Polynomial time) algorithms provide a probailistic analog to P to enhance efficiency while accepting a small chance of error for that speed. In verification, IP (Interactive Proof) systems provide a way to define a probabilistic analog of the class NP. IP includes problems not known to be in NP, demonstrating greater verification power due to randomness and interaction.
1.2.1. Interactive Pairs of Turing Machines
Prover (P)
An entity with unlimited computational power, aiming to convince the verifier of the truth of a statement.
Verifier (V)
A probabilistic polynomial-time Turing machine (with a random tape) that interacts with the prover to verify the statement's validity.
1.2.2. Interactions
The interaction consists of multiple rounds where the prover and verifier exchange messages. The verifier uses randomness to generate challenges, and the prover responds accordingly. The key properties of such systems are:
Completeness
If the statement is true, an honest prover can convince the verifier with high probability.
Soundness
If the statement is false, no cheating prover can convince the verifier except with a small probability.
1.2.3. Example: Quadratic Nonresidue Problem
- Definition An integer \( a \) is a quadratic residue modulo \(n \) if there exists an integer \( x \) such that: \[ x^2 \equiv a \pmod{n} \] An integer \( a \) is a quadratic nonresidue modulo \( n \) if no such integer \( x \) exists. Suppose A (Prover) claim that \( a \) is a quadratic nonresidue, and the B (Verifier) wants to check that using an Interactive Proof System.
An Interactive Proof System: The verifier B begins by choosing m random numbers \( \{r_1,r_2,...,r_m \} \). For each \( i \), \( 1 \leq i \leq m \), he flips a coin:
- If it comes up heads he forms \( t = r_i^2 \pmod{m} \).
- If it comes up tails he forms \( t = a \times ri^2 \pmod{m} \).
Then B sends \( t_1, t_2,..., t_m \) to A.
The prover having unrestricted computing power, finds which of the \( t_i \) are quadratic residues, and uses this information this information to tell B the results of his last m coin tosses.
If this information is correct, B accepts.
- Why this Will Work?
If \( a \) is really a quadratic nonresidue: According to the property of quadratic nonresidue:
- \( t = a \times r_i^2 \pmod{m} \) is a quadratic non-residue.
- \( t = r_i^2 \pmod{m} \) is a quadratic residue.
The prover can distinguish which side of the coin by looking whether t is a quadratic nonresidue or residue.
If \( a \) is a quadratic residue (Prover is lying): Then both \( t = a \times r_i^2 \pmod{m} \) and \( x = r_i^2 \pmod{m} \) are quadratic residues.
Which means the \( t_i \) are just random quadratic residues, all the \( t_i \) looks the "same" for the prover to guess the coin side, the prover will respond correctly in the last part of the computation with probability \( 1/2^m \).
2. Knowledge Complexity
By assuming the prover has unlimited computing power, The theoritical model introduced in Interactive Proof Systems can describe many languages that cannot be captured using NP model. But back to practice, seems that this model will only work in theory until those kinds of unlimited computing machine comes in real life. (Can determine NP problems in Polynominal Time). (i.e., Can quickly determine NP problems like whether t is a quadratic nonresidue or residue) Is that true?
2.1. Secret Knowledge
It is true that having unlimited computing machine is infeasible in current practice. However, by assuming the prover runs in polynomial time with some "secret knowledge" that can help it communicating with verifier efficiently. It can convince the verifier that the prover has that "secret knowledge" without revealing it.
2.1.1. Eyewitness & Police Officer
Let us try to illustrate the above ideas using an informal example: Assume that a crime \( x \) has happended, B is a police officer and A is the only eyewitness. A is greedy that he tells B that in order for him to tell about what happend in \( x \), $100,000 must being transferred to his bank account first. For B, it is important to verify whether A has that "secret knowledge" – details of crime \( x \) before making transfer. And for obvious reasons, A cannot just proof that he has that "secret knowledge" by telling it directly to the police officer B. By using an interactive proof systems, A can convince B he has that "secret knowledge" \(x\) without revealing it.
2.1.2. Quadratic Nonresidue Problem
In the interactive proof system describe in Example: Quadratic Nonresidue Problem, the key challenge for the Prover (A) is to determine whether each number ti sent by the verifier (B) is a quadratic residue or a quadratic nonresidue modulo m. This determination is crucial because it allows the Prover to infer the results of B's coin tosses and respond correctly. In this case, the "secret knowledge" for efficient computation on A is the prime factorization of the modulus \( m \). If the prover knows the prime factors of \( m \), they can efficiently compute the Legendre or Jacobi symblos to determine quadratic residuosity. This "secret knowledge" enables polynomial prover (A) interacte with B that are otherwise computationally infeasible. \[ (\text{"secret knowledge"} + \text{poly-time machine} \equiv \text{unlimited computing power} ) \]
Note: In practice, the modulus m is often chosen such that its factorization is hard to obtain (e.g., a product of two large primes), ensuring that without the secret knowledge (the factors), determining quadratic residuosity remains difficult.
2.2. The Knowledge Complexity
2.2.1. The Knowledge Computable from a Communication
Which communications (interactions) convey knowledge? Informally, those that transmit the output of an unfeasible computation (we cannot perform ourselves).
2.2.2. Knowledge Complexity
How to proof & ensure that the verifier gains no additional (secret) knowledge beyond the validity of the statement being proven?
- Simulator
To formalize this, the authors of the paper named "The Knowledge Complexity of Interactive Proof-Systems" introduce the idea of a simulator: – An algorithm that can generate transcripts of the interaction without access to the prover's secret information. The key is that the verifier cannot distinguish between transcripts generated by interacting with the actual prover and those produced by the simulator.
- Quantifying Knowledge
By linking the chance that the verifier able to distinguish the simulator, we quantify the knowledge complexity of proofs. \( \to \) If the simulator can effectively replicate the interaction, making it indistinguishable to the verifier, the knowledge complexity is considered zero. This formalization allows us to assess and prove the zero-knowledge property of certain interactive proofs. \( \to \) Show that sometimes the verifier cannot gain knowledge because whatever it sees could be simulated without the prover's help.
- Summary
The Simulator acts as an algorithm that can generate transcripts of the interaction between the prover (A) and verifier (B) without knowing the prover's secret knowledge. By producing transcripts that are indistinguishable from those of a real interaction, the simulator demonstrates that the verifier gains no additional knowledge from the interaction. Therefore, if such a simulator exists, we say that the proof is zero-knowldge because any information the verifier receives could have been simulated without the prover's secret.
2.2.3. A Zero Knowledge Interactive Proof System for the Quadratic Residuosity Problem
The section 4.2 of the paper introduces a zero knowledge IP system by carefully designing the protocol and demostrating the existence of a poly-time simulator.
Proof Idea: The difficulties of the proof is that M must compute the coin tosses correctly as a real prover (A) with secrect knowledge does.
Since the simulator M simulates both sides of the interaction, it both can know/control the randomless (coin tosses) on the verifier's (B) side.
3. General Purpose Doubly-Efficient Interactive Proof Systems
3.1. The Power of Randomness & Low-degree Polynomials
3.1.1. Example: Equality Testing
Two parties (i.e., Alice and Bob) each have an equal-length binary string: \[ a = (a_1, a_2, ..., a_n) \in \{0, 1\}^n \mid b = (b_1, b_2, ..., b_n) \in \{0, 1\}^n. \] They want to collaborate with each other (No malicious user) to determine whether \( a \equiv b \), while exchanging as few bits as possible.
A trivial solution:
Alice sends a to Bob, who checks whether \( a \equiv b \). The communication cost is \( n \), which is optimal amongst deterministic protocols.
A logarithmic cost randomized solution:
Let \( F \) be any finite field with \(|F| \geq n^2 \), then we interpret each \( a_i, b_i \) as elements of \( F \).
- "Interpret": viewing the bits as a part of a larger mathematic structure (\( F \)), but the values (0 or 1) stays the same.
"Fields": Fields arise from the need for a structured and versatile system in mathematics and science to perform algebraic operations in a consistent and predictable way.
A field is a set equipped with two operations, addition and multiplication, along with their respecitive inverses, subtraction and division (except division by zero). There operations follow specific rules, such as commutativity, associativity, and distributivity, ensuring that the result of any operation between elements of the field remains within the field itself (closure).
For example: (Non-Fields)
- The set of 2x3 matrix cannot perform multiplication.
- The set of 2x2 matrix, the multiplication between any two elements is not commutative.
Some of the elements in the set of Z/6Z (integers modulo 6) do not have multiplicative inverses.
A number a in Z/6Z has a multiplicative inverse if there exists a number b in Z/6Z such that \( a \times b \equiv 1 \pmod 6 \). For example, 2, 3, 4 do not have a multiplicative inverse, and in fact, integers mod p (Z/pZ) is a field when p is a prime number.
Let \( p(x) = \sum_{i=1}^{n} a_i \times x^i \) and \( q(x) = \sum_{i=1}^{n} b_i \times x^i \)
- Alice picks a random \( r \) in \( F \) and sends \( (r, p(r)) \) to Bob.
- Bob calculates \( q(r) \) and outputs EQUAL iff \( p(r) \equiv q(r) \), otherwise he outputs NOT-EQUAL.
Total Communication Cost: \( \mathcal{O}(\log n) \) bits
Since there are at least total \(n^2 \) elements in \(F\), to represents any of each elements, we need \( \log (|F|) = \log (n^2) = 2 \log (n) = \mathcal{O}(\log n) \) bits.
- If \( a \equiv b \): Then Bob outputs EQUAL with probability 1
- If \( a \neq b \): Then Bob outputs NOT-EQUAL with probability at least \( (1 - 1/n) \) over the choice of \( r \) in \( F \).
3.1.2. The Power of Low-degree Polynomials
Before proving "if \( a \neq b \), the probability of Bob is wrong is \( 1/n \).", let's may consider why we need to define \( p(x) \) and \( q(x) \) in this way? \[ p(x) = \sum_{i=1}^{n} a_i \times x^i \] \[ q(x) = \sum_{i=1}^{n} b_i \times x^i \]
- Reed-Solomon Error Corretion:
Since there are each total \( n \) bits of \( a \) and \( b \), the lowest-degree polynomials that for us can ensure the uniqueness of representation of each \( a_i \) and \( b_i \) is \( n \). Which means each bit \( a_i \) (and correspondingly \( b_i \)) is uniquely represented as the coefficient of a distinct term in the polynomial and affects the polynomial differently.
- Proof: If \( a \neq b \), then the probability of Bob is wrong is \( 1/n \)
Proof: Any non-zero polynomials \(d(x) \) of degree \(n\) has at most \(n\) roots in any fields (i.e., values \(r\) in \(F\) for which makes \(d(r) = 0\))
Assume the polynomials has more than \(n \) roots:
Let \( r_1, r_2, ..., r_{n+1} \) be distinct elements of the field \( F \), such that \( d(r_i) = 0 \) for each \( i = 1, 2, ..., n+1 \).
- Then the polynomail d(x) can be written as \[ d(x) = (x - r_1)(x - r_2) ... (x - r_{n+1})q(x) \] where \( q(x) \) is some polynomial of degree \( m \geq 0 \) and \( (x - r_i) \) are factors corresponding to the roots.
Then the product of \( (x - r_1)(x - r_2) ... (x - r_{n+1}) \) is a polynomial of degree \(n + 1\).
This assumption leads to a contradiction. Hence the polynomial \( d(x) \) can have at most \( n \) distinct roots.
Hence, the probability Alice randomly picks an \( r \) such that \( (p(r) \equiv q(r) \to d(r) = p(r) - q(r) \equiv 0 ) \) is at most \( ( n/|F| \leq n/n^2 \leq 1/n ) \).
Since n-degree polynomial \( d(r) = p(r) - q(r) \) has at most \( n \) roots, a randomly picked value \( r \) in \( F \) of size \( n^2 \) will only let \( d(r) \) equal to zero with a probability \( ( n/n^2 = 1/n )\).
- Polynomials are constrained by their degree
As we can see that, A polynomial \( d(x) \) of degree \( n \) over a large field \( F \) is uniquely determined by its values on \( n+1 \) distinct points. (one extra constant coefficient with no variable)
If \( p(x) \) is not equal to \( q(x) \), then they can only agree on at most n points (\( d(x) = p(x) - q(x) = 0 ) \), meaning they differ at most everywhere on the field. This strong divergence is very useful and powerful for error detection.
- The Power of Low-degree
In practice, we aim to keep the degree of a polynomial as low as possible while ensuring that each term uniquely affects the polynomial.
A polynomial of degree \( n \) has \( n+1 \) terms, each with a distinct power of \( x \) and a unique coefficient, which ensures that every term influences the polynomial differently.
Low-degree polynomials are powerful because if two polynomials differ, they will diverge over many points when evaluated multiple times with efficient evaluation. This makes discrepancies clear across a larger number of evaluations.
In cryptography, this property allows for efficient detection of errors or differences, especially when random evaluation are used.
- Introducing randomness amplifies this power by preventing predictable evaluations, making it harder to cheat or hide discrepancies.
Random evaluations allow us to verify claims efficiently and securely, ensuring robustness.
By using nondeterministic inputs in low-degree polynomials, we combine the precision of low-degree polynomials with the unpredictability of randomness.
3.1.3. Example: Verifying Matrix Products
Input are two \(( n \times n )\) matrices A, B. The goal is to compute \( A \cdot B \).
The time complexity is \( \mathcal{O}(n^3) \), this is because each element in the resulting matrix \( A \cdot B \) is computed by taking the dot product of a row of \( A \) and a column of \( B \). For each of the \( n^2 \) elements in the resulting matrix, you perform n multiplications and additions, leading to a total of \( \mathcal{O}(n^3) \) operations. The best bound of matrix multiplication algorithm is \( \mathcal{O}(n^{2.371552}) \).
What if an untrusted prover P claims the answer is a matrix \(C \)? Can V verify that \( C \equiv A \cdot B \) in \( \mathcal{O}(n^2) \) time?
- The \( \mathcal{O}(n^2) \) Protocol:
- V picks a random \( r \) in \( F \) and lets \( x = (r, r^2, ..., r^n) \).
- V computes \( C \cdot x \) and \( A \cdot ( B \cdot x ) \), accepting if and only if they are equal.
Runtime Analysis:
V's runtime dominated by computing 3 matrix-vector products, each of which takes O(n2) time.
\( C \cdot x \) is one matrix \( (n \times n) \) times a vector \( (n \times 1) \), the time complexity is \( \mathcal{O}(n^2) \).
Because each row of B is multiplied by the vector \( x \), requiring \( n \) multiplications and \( n \) additions per row, and there are \( n \) rows.
\( (A \cdot B) \cdot x = A \cdot (B \cdot x) \) takes two matrix-vector multiplications.
Matrix multiplication is associative, \( B \cdot x \) takes \( \mathcal{O}(n^2) \) first, and produces a \( (n \times 1) \) matrix \(M\), then \( A \cdot M \) will also take \( \mathcal{O}(n^2) \).
- Correctness Analysis:
If \( C \equiv A \cdot B \):
Then V accepts with probability 1
If \( C \neq A \cdot B \):
The V rejects with high probability at least \( (1 - 1/n) \).
Simplified Proof: Recall that \( x = (r, r^2, ..., r^n) \). So each matrix-vector multiplication is indeed the polynomials we've seen in Reed-Solomon Error Corretion at \( r \) of the i-th row of C.
So if one row of \( C \) does not equal the corresponding row of \( A \cdot B \), the fingerprints for that row will differ with probability at least \( ( 1 - 1/n ) \), causing V to reject.
3.1.4. Function Extensions
- Schwarts-Zippel Lemma
Recall Proof: If \( a \neq b \), then the probability of Bob is wrong is \( 1/n \), the Schwarts-Zippel Lemma is an extension/generalization of this to a multivariate polynomials. If \( p \) and \( q \) are distinct l-variate polynomials of total degree at most \( d \). Then the same kind of statement holds. If we evaluate them at random-chosen inputs, they agree at the probability at most \( d/|F| \).
Total Degree:
The total degree of a polynomial is the maximal of the sums of all the powers of the variables in one single monomial.
For example: \( \text{deg}(x^2yz^4 - 3y + 4xe^5 - xy^3z^2) = 7 \) (first monomial).
- Extensions
An extension polynomial bridges the gap between a function defined on a discrete set of points and a function defined over a continuous (or larger discrete) domain.
A l-variate polynomial \( g \) over \( F \) is said to extend \( f \) if and only if \( g \) agrees at all of the input where \( f \) is defined.
For example: We are given a function \( f \) that maps l-bits binary strings to a field \( F \). This means \( f \) is defined on all possible combinations of \( l \) bits (\( \{0, 1\}^l \)). But it is only defined on a finite set of points (the binary strings), which usually cannot form a field, where we can leveraging algegratic tools of polynomials.
A function \( g \) is said to extend \( f \) if:
For all \( x \) in \( \{0, 1\}^l, f(x) \equiv g(x) \).
\( g \) agrees with \( f \) on all inputs where \( f \) is initially defined.
\( g \) is defined in a larger field.
Let's say \( l = 1 \), then \( f \) is defined on input set \( {0, 1} \). Where \( f(0) = 2 \), \( f(1) = 3 \).
If we want to extend \( f \) to field \( R \) (real numbers), then our objective is to find a polynomial \( g(x) \) in R such that \( g(0) = 2 \), and \( g(1) = 3 \).
We can find \( g(x) = x + 2 \) defined for all \( x \) in \( R \), not just \( {0, 1} \).
By representing \( f \) as a polynomial \( g \), we can apply the rich toolbox of algebraic methods & theorems avaliable for polynomials. For example, Schwarts-Zippel Lemma is more poweful when there is a low-degree extension represents that function.
- Constructing Low-Degree Extensions
There is a vector \( w = (w_0, w_1,..., w_{k-1}) \) in \( F^k \). \( W: H^m \to F \):
We define a function \( W: H^m \to F \) such that \( W(z) = w_{a(z)} \) if \( a(z) \leq k-1 \), and \( W(z) = 0 \) otherwise.
\( W \) acts as a way to represent the vector \( w \) as a function over \( H^m \) that for indices corresponding to elements of \( w \), \( W(z) \) returns the corresponding \( w_i \), otherwise \( 0 \).
\( a(z): H^m \to F \):
The \( a(z) \) is the lexicographic order of \( z \), which means transforming a m-element vector to an index in the original \( w \).
Low-Degree Extension \( \widetilde{W}: F^m \to F \):
\( \widetilde{W} \) is an extension of \( W \) input from \( H^m \) to \( F^m \), such that \( \widetilde{W} \) is a polynomial of degree at most \( |H|-1 \) in each variable, which enables efficient computation and has nice algebratic properties. The degree of \( W \) is at most |H|-1 can be understand in the following cases:
- Univariate Case: \( W(x): H \to F \)
- We have \( n = |H| \) distinct points in the subset H.
- For each \( h \in |H| \), we want \( \widetilde{W}(x): F \to F) \) to satisfy \( \widetilde{W}(h) \equiv W(h) \). A univariate polynomial of degree at most \( (|H|-1) \) can be uniquely determined by its values on those \( n \) points.
- Multivariate Case: \( W(x): H^m \to F^m \)
- Similarly, we have an m-dimensional grid \(H^m\), where \( H \subseteq F\) and \( |H| = n \).
A polynomial \( \widetilde{W}(x_1, x_2,..., x_m) \) (in m variables) that:
- Agrees with \( W \) on every poit in \( H^m \) (i.e., \( \widetilde{W}(h) = W(h) \) for all \( h \in H^m \).
- Has degree at most \( |H| - 1 \) in each variable. (i.e., for each variable \( x_i \), the highest exponent of \(x_i \) in \( \widetilde{W} \) is \( \leq |H|-1 \). In other words, just like in the univariate case (where we need \( deg(\widetilde{W}) \leq |H|-1 \) to interpolate \( |H| \) points), here each variables is similarly bounded by \( |H|-1 \). This ensures \( \widetilde{W} \) can uniquely "pass through" all the points specified by \( W \) on \( H^m \).
The low-degree extension is the simplest polynomial that fits all the given points in \( H^m \), The size of \( H \) determines how "complex" the polynomial needs to be (i.e., degree) in order to pass through all those points without ambiguity.
Here is the full definition of \( \widetilde{W}: F^m \to F \):
\[ \widetilde{W}(t_1, ..., t_m) = \sum_{i=0}^{k-1} \widetilde{B_i}(t_1, ..., t_m) \cdot w_i. \]
\( \widetilde{B_i}: F^m \to F \) Indicator Functions:
The polynomials \( \widetilde{B_i} \) act as an indicator functions on \( H^m \), on that field, \( \widetilde{B_i}(z) = 1 \) if and only if \(i \equiv a(z) = a(t_1,...,t_m) \). Otherwise \( \widetilde{B_i}(z) = 0 \).
Outside \( H^m \) (in \( F^m / H^m) \), \( \widetilde{B_i} \) takes on values determined by its polynomial extension, which means when input is outside \( H^m \), \( \widetilde{W} \) can have sum of multiple terms.
\( \widetilde{W}: F^m \to F \) Sum Selection:
\( \widetilde{W} \) is a low-degree polynomial, by summing all possible \( k \) indexes \( i \) from \( 0 \) to \( k-1 \), in the field of \( H^m \), according to the definition of \( \widetilde{B_i} \) there will be only one "selected" corresponding \( w (w_i) \) which is equal to \( \widetilde{W} \).
\( \widetilde{B}(z, p): F^m \to F \) Lagrange Basis Polynomial:
Also, we can express \( \widetilde{W}(t_1, ..., t_m) \) as:
\[ \widetilde{W}(z) = \sum_{p \in H^m} \widetilde{B}(z, p) \cdot W(p) \] Which constructs the polynomial \( \widetilde{W} \) by summing the contributions from all basis polynomials \( \widetilde{B}(z, p) \), each weighted by \( W(p) \).
- \( \widetilde{B}(p, p) = 1 \)
- \( \widetilde{B}(z, p) = 0 \) for all \( z \in H^m \) where \( z \neq p \)
- When \( z \) outside \( H^m \), \( \widetilde{B}(z, p) \) can be other values
This means, when \( z \) is in \( F^m / H^m \), each \( \widetilde{B}(z, p) \) is a polynomial in \( z \) and can be evaluated at any \( z \) in \( F^m \). And To compute \( \widetilde{W}(z) \) at that time, which means by summing over all \( W(p) \in H^m \) that has valid \( \widetilde{B}(z, p) \).
Since \( W \) is only defined in the field \( H^m \). And the low-degree polynomial property still holds.
By enlarging the input field \( (F^m) \) while keeping the degree of \( \widetilde{W} \) low, this way of constructing LDE \( \widetilde{W} \) helps us effectively using the power of randomless to detect the potencial error, as mentioned in The Power of Low-degree Polynomials
- Univariate Case: \( W(x): H \to F \)
- Multilinear Extensions
A multilinear extension of a function \( f: \{0, 1\}^n \to F \) (where \( F \) is a finite field) is a polynomial \( \bar{f}: F^n -> F \) that agrees with \( f \) on \( \{0, 1\}^n \) and is multilinear.
Multilinear means each variable \( x_i \) in \( \bar{f} \) has a degree at most 1, which make them highly effectie to evaluate. (a univariate \( f \) by make other variates constants will becomes a linear function). And since multilinear polynomials have minimal degree, the error detection probability is maximized for a given field size.
3.2. The Sum-Check Protocol
Suppose given a \( l \)-variate polynomial \( g \) defined over a finite field \( F \). The purpose of the sum-check protocol [LFKN92] is to compute the sum: \[ H := \sum_{b_1 \in \{0, 1\}} \sum_{b_2 \in \{0, 1\}} ... \sum_{b_l \in \{0, 1\}} g(b_1, b_2,...,b_l) \]
In applications, this sum will often be over a large number of terms, so the verifier (V) may not have the resources to compute the sum without help. Instead, she uses the sum-check protocol to force the prover (P) to compute the sum for her.
The verifier(V) wants to verify that the sum is correctly computed by the prover(P), where \( g \) is a known multivariate polynomial over a finite field F.
3.2.1. Initialization
P claims that the total sum equals a specific value \( H_0 \).
3.2.2. First Round
- P sends a univariate polynomial \( s_1(x_1) \) to V, which is claimed to equal:
\[ s_1(x_1) := \sum_{b_2 \in \{0, 1\}} \sum_{b_3 \in \{0, 1\}} ... \sum_{b_l \in \{0, 1\}} g(x_1, b_2,...,b_l) \]
- V calculates \( s_1(0) + s_1(1) \) and checks whether that value is equal to \( H_0 \).
Since \( s_1 \) is a univariate polynomial, V can compute \( s_1(0) + s_1(1) \) directly (not using structure of \( H_0 \)) in relatively short amount of time.
- V picks a random element \( r_1 \) from F and sends it to P.
- V sets \( H_1 := s_1(r_1) \) for use in the next iteration.
\[ H_1 = s_1(r_1) := \sum_{b_2 \in \{0, 1\}} \sum_{b_3 \in \{0, 1\}} ... \sum_{b_l \in \{0, 1\}} g(r_1, b_2,...,b_l) \]
3.2.3. Interative Rounds \( ( i = 2 \) to \( l )\)
For each round i:
- P sends a univariate polynomial \( s_i(x_i) \) to V, claimed to equal:
\[ s_i(x_i) := \sum_{b_{i+1} \in \{0, 1\}} ... \sum_{b_l \in \{0, 1\}} g(r_1,...,r_{i-1}, x_i, b_{i+1},...,b_l) \] Which representing the partial sum over variables \( b_{i+1} \) to \( b_l \), with \( b1 \) to \( b_{i-1} \) fixed to random values chosen by the verifier in previous rounds.
- V calculates \( s_i(0) + s_i(1) \) and checks whether that value is equal to \( H_{i-1} \).
\( H_{i-1} \) is the sum in the previous iteration: \( s_{i-1}(r_{i-1}) \):
\[ H_{i-1} := s_{i-1}(r_{i-1}) := \sum_{b_{i} \in \{0, 1\}} ... \sum_{b_l \in \{0, 1\}} g(r_1,...,r_{i-1},b_i,...,b_l) \]
- V picks a random element \( r_i \) from \( F \) and sends it to P.
- V sets \( H_i = s_i(r_i) \) for use in the next iteration.
\[ H_i = s_i(r_i) := \sum_{b_{i+1} \in \{0, 1\}} ... \sum_{b_l \in \{0, 1\}} g(r_1,...,r_i,b_{i+1},...,b_l) \]
3.2.4. Final Check
In the final iteration: \[ H_l = s_l(r_l) := g(r_1, r_2, ..., r_l) \] All the \( b_i \) in the original equation of \( g(b_1,...,b_l) \) has being fixed to the random number \( r_i \) chosen by V in previous \( l \) rounds.
V then check whether \(s_l(r_l) \) is equal to \(g(r_1,r_2,...,r_l) \) by calculating it herself. If \( s_l(r_l) \) is equal to \( g(r_1,...,r_l) \), V accepts.
3.2.5. Soundness of the Sum-Check Protocol
Completeness holds by design: If P sends the prescribed messages, then all of V's check will pass. Let's analysis the soundness of the protocol:
- Base Case: Final Check
The Final Check (*key) is the most crucial part, and plays a key role in understanding this protocol.
In the last step, V directly computes \( g(r_1,r_2,...,r_l) \) and \( sl(rl) \) to check whether they are equal, note that this is the only time for V to actually compute \(l\)-variate polynomial \(g\) by herself.
This direct comparison is critical because it anchors the entire verification process to the actual function \( g \).
According to Schwarts-Zippel Lemma, in the final check, if \( s_l \neq g \), then the probability \( g(r_1,...,r_l) \) equal to \( s_l(r_l) \) is less than \( d/|F| \). (\(d\) is the Total Degree of polynomial \(g \) and \( s_l \)).
- Backward Reasoning
By working backwards from the last iteration, We can understand the correctness of each step based on the validity of the final check:
In the last step, if the equality of \( g(r_1,...,r_l) \) and \( s_l(r_l) \) holds, with high probability, \( s_l(x_l) \) must be the correct polynomial of \( g(r_1,...,x_l) \), because a dishonest prover would need to guess \( r_l \) to fake \( s_l(x_l) \) in order to pass the test, and the chance to success is less than \(d/|F| \), where \(d\) is the totdal degree of polynomial \(g \) over field \( F \)..
So if V can confirm: \[ s_l(x_l) := g(r_1,...,r_{l-1},x_l) \] is correct formed w.h.p in Base Case: Final Check, then in iteration \((l \)\(-\)\(1)\), \( s_{l-1}(r_{l-1}) \) can be written as:
\begin{eqnarray*} s_{l-1}(r_{l-1}) &=& \sum_{b_l \in \{0, 1\}} g(r_1,...,r_{l-2}, r_{l-1}, b_l) \\ &=& g(r_1,...,r_{l-1},0) + g(r_1,...,r_{l-1},1) \\ &=& s_l(1) + s_l(2) \end{eqnarray*}So \( s_{l-1}(x_{l-1}) \) must be correctly formed w.h.p based on correct \( s_l(x_l) \).
- Inductive Case: If \( s_i(x_i) \) is correctly formed in round \( i \)
For any round \( i \), if we can make sure \( s_i(x_i) \) is correctly formed w.h.p that: \[ s_i(x_i) = \sum_{b_{i+1} \in {0, 1}}...\sum_{b_l \in {0, 1}} g(r_1,...,r_{i-1}, x_i, b_{i+1},...,b_l) \] Backwards to its previous round:
\begin{eqnarray*} s_{i-1}(r_{i-1}) &=& \sum_{b_i \in {0, 1}} ... \sum_{b_l \in {0, 1}} g(r_1,...,r_{i-1},b_i,...,b_l) \\ &=& \sum_{b_{i+1} \in {0, 1}} ... \sum_{b_l \in {0, 1}} g(r_1,...,r_{i-1},0,b_{i+1},...,b_l) \\ &+& \sum_{b_{i+1} \in {0, 1}} ... \sum_{b_l \in {0, 1}} g(r_1,...,r_{i-1},1,b_{i+1},...,b_l) \\ &=& s_i(1) + s_i(2) \end{eqnarray*}Then, with high probability, \(s_{i-1}(r_{i-1}) \) should be correct, which indicate \(s_{i-1}(x_{i-1}) \) should also be correctly formed.
By induction, in the first round, since \( s_2(x_2) \) is correctly formed, then w.h.p \( s_1(x_1) = s_2(0) + s_2(1) \) should be formed correctly. And thus w.h.p, \( H_0 = s_1(0) + s_1(1) \) should be the correct answer.
So far, we've proved the soundness of the protocol.
3.2.6. Analyzing the Sum-Check Protocol
We proved the soundness of the sum-check protocol in the previous subsection by showing that: According to Schwarts-Zippel Lemma, with high probability a dishonest P cannot initially fake a incorrect \(H_0 \) that won't break the consistent in every round of the protocol and causes a correct \(s_l(r_l) \) equals to \( g(r_1, ..., r_l) \) in the final round.
The soundness is relying on the final check, by validating the final step, V effectively validates all prior steps due to their interdependence in the chain of validations. For example, here we show a specific cheating strategy when there is no final check:
Imagine a dishonest P consistently uses different function \( j(x) \) instead of the correct function \( g(x) \) throughout the protocol.
In this scenario, P computes all the partial sums and polynomials correctly with respect to \( j(x) \), ensuring consistency at each step, P only deviates from the correct computation at the final check when V computes \( g(r_1,...,r_l) \).
During the protocol, V only computes \( g(x) \) in the final check, the only point of detection is the final check, and the probability of detection is \( d/|F| \). But if there is no final check, which means the protocol ends at round \( l \), V cannot have any guess of what function P used to compute \(H_0\).
In this part, we are going to quantify the probability for a dishonest P can succesfully convince V for the wrong computation \( H_0 \).
Deviation at Round i:
For any round \( i \), let's assume what a dishonest P sends to V ( \( s_i(x_i) \) ) is formed incorrectly (i.e., \( s_i(0) + s_i(1) \neq s_{i-1}(r_{i-1}) \) ):
The probability V does not detect the deviations is \( d/|F| \):
Since in round \( i \), the \( s_i(r_i) \) is set to \( H_i \) by V (in round \( l \), \( H_i \) becomes to \( g(r_1,...,r_l) \)). The probability for P to provide \( s_i \) to satisfy polynomial \( H_i - s_i(r_i) = 0 \) with randomly selected \( r_i \) by V from field \(F\) is \(d/|F| \), where \(d\) is the total degree of polynomial \( g \) and \( s \), according to Schwarts-Zippel Lemma.
The cumulative probability of V does not detect the deviations in future iterations is \( (l-i)d/|F| \)
If \( s_i(r_i) \neq H_i \), P is left to prove a false claim in the recursive call:
If \( s_i(r_i) \neq H_i \), the prover must convince the verifier of a false claim in the next round.
The prover must construct \( s_{i+1}(x_{i+1}) \) such that \( s_{i+1}(0) + s_{i+1}(1) = s_i(r_i) \). This means \( s_{i+1}(r_{i+1}) \) must deviate from true \( H_{i+1} \), leading to a new error polynomial in subsequent rounds.
Thus, we can get the the cumulative probability of acceptance when prover deviates at round \( i \) and the verifier does not detect in rounds \( i+1 \) to \( l \) and end up accecpting: \( (l-i)d/|F| \).
The reason we sum the probabilities is due to the events of failing to detect the P in each round are over V's independent random choices \( r_i \). And a cheating P can adapt their messages based on the V's previous random choices and messages exchanged so far. That means P's actions can be dependent on prior interactions.
The total possibility of acceptance when prover deviates at round \( i \) is \( (l-i+1)d/|F| \):
We have:
Upper Bound: \( ld/|F| \)
The worse-case scenario is that if P deviates in the first iteration \( (i=1) \), the total probability of acceptance is:
\[ d/|F| + (l-1)d/|F| = ld/|F| \]
3.2.7. Example: \( l = 2 \)
Let's consider a minimum sum-check protocol with \( l = 2 \) by working backward from the last iteration to understand the soundness:
- Final Check:
- V computes \( g(r_1,r_2) \)
Comparison with P's \( s_2(r_2) \)
If the equality holds, with high probability, \( s_2(x_2) \) must be correct polynomial \( g(r_1,x_2) \), because a dishonest prover would need to guess \( r_2 \) to fake \( s_2(x_2) \).
- Round 2:
Since \( s_2(x_2) \) that P sends to V is confirmed to be correct, the sum: \( s_2(0) + s_2(1) = g(r_1,0) + g(r_1,1) = H_1 \) must be satisfied. This confirms that \( H_1 \) is correctly computed based on \( s_2(x_2) \).
- Round 1:
At the end of this round, P sets \( H_1 = s_1(r_1) \). Since \( H_1 \) is now confirmed to be \( g(r_1,0) + g(r_1,1) \), it implies \( s_1(r_1) = g(r_1,0) + g(r_1,1) \), thus \( s_1(x_1) \) is correct polynomial \( \sum_{b_2 \in {0, 1}} g(x_1, b_2) \) w.h.p, any deviation would be detected with high probability due to the random \( r_1 \).
- Initialization:
V check \( s_1(0) + s_1(1) = H_0 \) Since \( s_1(x_1) \) is correct w.h.p, then \( H_0 \) should be correctly formed based on \( s_1(x_1) \).
The verifier only needs to perform a few evaluations of univariate-polynomials and checks (except the final check \( g \)), making the protocol practical even for large computations.
3.3. The GKR Protocol
3.3.1. Example of the Limitations of Sum-Check Protocol: Sharp-SAT
The Sharp Satisfiability Problem (#-SAT), is the problem of counting the number of interpretations that satisfy a given Boolean formula. Let \( \phi \) be a Boolean formula of size \( S \) over \( n \) variables, (i.e., \( \{0, 1\}^n \to \{0, 1\} \)) the #-SAT ask us to count the number of satisfying assignments of \( \phi \).
- Arithmetization
An intuitive way to solve the #-SAT problem is by using The Sum-Check Protocol, which means compute: \[ \sum_{x \in {0, 1}^n} \phi(x) \] The final answer of all possible evaluation of \( \phi \) is the answer we want, but as we all know that, the sum-check protocol requires the the function (\( g \)) to be polynomial, and to control communication and P and V's runtime, we need \( g \) to be "low-degree". By using Arithmetization, we can construct the extention polynomial \( g \) of the Boolean formula \( \phi \), which means replace \( \phi \) with an arithmetic circuit:
By going gate-by-gate through \( \phi \), we can replace each gate with the gate's multilinear extension: \[ NOT(x) \to 1 - x \] \[ AND(x, y) \to x \times y \] \[ OR(x, y) \to x + y - x \times y \]
- Costs of Sharp-SAT Using Sum-Check Protocol
The degree of \( g \) is less or equal to \( S \) (The size of \( \phi \)), and the runtime of \( g \) is of \( \mathcal{O}(S) \), let's analyzing the costs when applying sum-check protocol to this problem in total \( n \) rounds:
Communication Cost: P sends a polynomial of degree at most \( S \) in each round, V sends one field element \( r \) in each round. The total communication cost is \( \mathcal{O}(S \times n) \).
Why degree \( d \) of \( s(x) \) is at most \( S \)?
The degree of \( s(x) \) that P sents to V in each round should be equal to \( g \), which is the arithmetic version of circuit \( \phi \).
Each polynomial \( g \) can be prepresented by its coefficients, requiring \( \mathcal{O}(S) \) field elements, where \(S \) represents the size of \( \phi \). In Arithmetization when composing gates in \(\phi\), each gates can affect the overall degree at most 2 (AND gate \( x \times y \) is in degree \( 2 \)).
In the worst-case scenario (where the gates are composed in a way that maximizes degree growth 2), the degree of the polynomial representing \( \phi \) can be up to \( 2^D \), where \( D \) is the depth of the formula. For a balanced formula, the depth \( D \) is \( \mathcal{O}(\log S) \), therefore, the degree of \( g \) in each variable is at most:
\[ 2^{\mathcal{O}(\log S)} = \mathcal{O}(S) \]
V Time Complexity: \( \mathcal{O}(S) \) time to process each of the \( n \) messages of P, and \( \mathcal{O}(S) \) time to evaluate \( g(r) \). The total cost is \( \mathcal{O}(S \times n) \)
P Time Complexity: P needs to compute the univariate polynomial \( g_i \) by summing over all possible assignments of the remaining \( n-i \) variables, which involves \( 2^{n-i} \) evaluations of \( g \) per round. And P's total time is dominated by the need to consider all \( 2^n \) possible assignments to the variables, leading to \( \mathcal{O}(S \times n \times 2^n) \).
- Limitations
Sharp-SAT is a Sharp-P-complete problem, hence, the protocol we just saw implies every problem in Sharp-P has an interactive proof protocol with a polynomial time verifier.
Since in #-SAT, the time complexity of P is \( \mathcal{O}(S \times n \times 2^n ) \), which means even to very simple problems, the honest prover would require superpolynomial time by using sum-check protocol. And this seems unavoidable for Sharp-SAT, since we don't know how to even solve the problem in less than \( 2^n \) time. We can hope to solve/prove easier problems without turing those problems into #-SAT instances since it is not practical.
3.3.2. Introduction to the GKR Protocol
- Recall: The Notion of Interactive Proofs in 1980s
Recall Interactive Proof Systems in 80s, when the IP model first came out, it was only a theoretical model, which means no one cares about the runtime of the all powerful P. At that time, the idea was we want to see how expressive, which computation can P prove to a polynomial time verifier by using interactive proofs.
- Delegating Computation: Interactive Proofs for Muggles
In the paper "Delegating Computation: Interactive Proofs for Muggles (2008)" by Shafi Goldwasser, Yael Tauan Kalai and Guy N. Rothblum. The author introduced a protocol that can be used to effectively for both P and V to prove/verify the correctness of general purpose computation.
More formally: For any question/language computable by a log-space uniform boolean circuit with depth \( d \) and input length \( n \), the protocol can ensure:
The costs to V grow linearly with the depth \( d \) and input size \( n \) of the circuit, and only logarithmically with size \( S \) of the circuit.
V runs in time \( (n + d) \times polylog(S) \), where \( polylog(S) \) means a polynomial function of \( log S \) (e.g., \( (log S)^k) \).
The space complexity of V is \( \mathcal{O}(log S) \).
P's running time is polynomial to the input size \( n \).
P's runtime is not much more than perform the computation, which is in time \( poly(n) \). Meaning it is efficient for practical purpose.
3.3.3. Blueprint of the Protocol
- Layered Circuit
The protocol divide circuit \( C \) into \( D \) phases, since for each phase/layer \( i \), if its gates value is wrong, then some gates' value in phase/layer \( (i+1) \) must be wrong. More specifically, some gates that connected to the error gates in layer \( i+1 \) must be incorrect.
- Local Correctness
With this in mind, we run a local sum-check protocol at each phases/layers to ensure local correctness.
That is, we define a function \( V_i: F^{s_i} \to F \) (where \( s_i \) is \( log Si \), means the bits of # gates in layer \( d \)) And for any gate \( g_i \) in that layer \( i \), running \( V_i(g_i) \) will give us the corresponding gate value of \( g_i \). By running \(d \)-subprotocols of each layers, where each protocol show connections between layer \( i \) and its layer before \( (i+1) \). (More specifically, if \( V_i \) is not correct, then w.h.p \(V_{i+1} \) is not correct). Eventually, it will be reduced to a claim about the input values (layer \( d \)), which are known to the verifier.
3.3.4. Detailed Descriptions of the Protocol
Fix boolean circuit \(C \): \( \{0, 1\}^n -> \{0, 1\} \) of size (number of gates) \( S \) and depth \( d \). The GKR protocol is an interactive proof protocol to prove that \( C(x) = 1 \).
Assume without loss of generality that C is layered, which means that each gate belongs to a layer, and each gate in layer \( i \) is connected by neighbors (determined) only in layer i+1. In a nutshell, the goal is to reduce V's runtime to be proportional to the depth \( d \) of the circuit \( C \) being computed, rather than its size, without increasing P's runtime by too much.
- Arithmetize C
Recall Arithmetization: Convert C to a layered arithmetic circuit with fan-in 2 with layer \( d \), and only consists of gate of the form ADD and MULT. fan-in 2 means each gates in the i-th layer takes inputs from two gates in the (i+1)-th layer. layer 0 denotes the output layer and \( d \) denotes the input layer.
We denote the number of gates in layer \( i \) as \( S_i \), and let \( s_i \) to be the number of input elements of the final layer \( d \). (\( F^{s_i} \)) As we mentioned in the blueprint. We define function \( V_i(z) \) at each layer \( i \) to return the value of that gate with index \( z \).
\[ V_i: H^s_i \to F \]
\( V_0 \) corresponds to the output of the circuit, and \( V_d \) corresponds to the input layer. Here is a detailed Definition of \( V_i \):
To define \( V_i \), let's recall the function \( W \) in Connstructing LDE: Imagine layer i of the circuit \( C \) to be a vector of \( S_i \) gates: \( g = (g_1, g_2, ..., g_{S_i}) \) where each \( g_j \) refers to the value of that gate. Then a function \( V_i: H^m \to F \) can be defined such that \( V_i(z) = g_a(z) \) if \(a(z) \) is a valid gate in the vector \(g \) and \( V_i(z) = 0 \) otherwise.
Note that for every \(p \in H^{s_i} \):
\[ V_i(p) = \sum_{w_1, w_2 \in H^{s_i}} \widetilde{add_i}(p, w_1, w_2) \times (\widetilde{V_{i+1}}(w_1) + \widetilde{V_{i+1}}(w_2)) + \widetilde{multi_i}(p, w_1, w_2) \widetilde{V_{i+1}}(w_1) \times \widetilde{V_{i+1}}(w_2) ) \]
Where \( \widetilde{add_i}, \widetilde{multi_i} \) refer to the low-degree extensions of wiring predicates \( add_i \) and \( mult_i \) of layer i:
\( add_i (mult_i) \) takes one gate label \( p \in H^{s_i} \) of layer i and two gate labels \( w_1 \), \( w_2 \in H^{s_i} \) in layer i+1, and outputs 1 if and only if gate p is an addition (multiplication) gate that takes the output of gate \( w_1, w_2 \) as input.
- Low Degree Extension of \( V_i \) at Layer \( i \)
In the i-th phase \( (1 \leq i \leq d) \): P runs a local protocol with V to argue the correctness of \( V_i \).
To do this, a sum-check protocol of layer i will be applied to let P reduce the task of proving: \[ \widetilde{V_i}(z_i) = r_i \] to the task of proving: \[ \widetilde{V_{i+1}}(z_{i+1}) = r_{i+1} \] where \( z_i \in F^{s_i} \) is a random value determined by the verifier.
As we discussed in previous sections, \( \widetilde{V_i}(z) \) can be expressed as:
\[ \widetilde{V_i}(z) = \sum_{p \in H^{s_i}} \widetilde{B}(z, p) \times V_i(p) \]
By replacing \( \widetilde{V_i}(p) \) of that formula, we can get for every \( z \) in \( F^{s_i} \): \[ \widetilde{V_i}(z) = \sum_{p, w_1, w_2 \in H^{s_i}} \widetilde{B}(z, p) \times ((\widetilde{add_i}(p, w_1, w_2) \times (\widetilde{V_{i+1}}(w_1) + \widetilde{V_{i+1}}(w_2))) + (\widetilde{mult_i}(p, w_1, w_2) \times \widetilde{V_{i+1}}(w_1) \times \widetilde{V_{i+1}}(w_2))) \]
For every \( z_i \) in \( F^{s_i} \), let \( f_{i,z_i}: F^{3s_i} \to F \) to be the function defined by: \[ f_{i, z_i}(p, w_1, w_2) = \widetilde{B}(z_i, p) \times ((\widetilde{add_i}(p, w_1, w_2) * (\widetilde{V_{i+1}}(w_1) + \widetilde{V_{i+1}}(w_2))) + (\widetilde{mult_i}(p, w_1, w_2) \times \widetilde{V_{i+1}}(w_1) \times \widetilde{V_{i+1}}(w_2))) \]
Then \( \widetilde{V_i}(z_i) \) can be expressed as: \[ \widetilde{V_i}(z_i) = \sum_{p, w_1, w_2 \in H^{s_i}} f_{i,z_i}(p, w_1, w_2) \]
- Sum-Check Protocol at Layer 0
At each layer \( i \), P wants to convince V that \( V_i(z) \equiv r_i \) (At the output layer 0: \( V_0(z) = r_0 \), and \( r_0 \) is the output value of the entire circuit \(C \)).
First P will give the claim of the SUM over the gate in layer 0: \( \widetilde{V_0}(z) = r_0 \) (\(F^{3s_i} \to F \)), which should be the final result of the entire circuit: \[ r_0 = \widetilde{V_0}(z) = \sum_{p, w_1, w_2 \in H^{s_i}} f_0(p, w_1, w_2) \]
To verify this, P running an interactive Sum-check protocol with V on the output layer: \[ r_0 = \widetilde{V_0(z)} = \sum_{p, w_1, w_2 \in H^{s_i}} \widetilde{B}(z, p) \times ((\widetilde{add_0}(p, w_1, w_2) \times (\widetilde{V_1}(w_1) + \widetilde{V_1}(w_2))) + (\widetilde{mult_0}(p, w_1, w_2) \times \widetilde{V_1}(w_1) \times \widetilde{V_1}(w_2))) \]
As we described in 4. Final Check (*key), in the final step of the sum-check protocol, V needs to compute on her own to evaluate function \( f_0(p, w_1, w_2) \) at random inputs \( p, w_1, w_2 \) chosen by herself.
Which means V needs to compute:
\[ ((\widetilde{add_0}(p, w_1, w_2) \times (\widetilde{V_1}(w_1) + \widetilde{V_1}(w_2))) + (\widetilde{mult_0}(p, w_1, w_2) \times \widetilde{V_1}(w_1) \times \widetilde{V_1}(w_2))) \]
and compare with \( s_0(p, g) \) that P sends to her to check the correctness of the function \( s_0() \).
Plus, since P now give the \( s_0() \), V can now evaluate \( \widetilde{add_0}, \widetilde{mult_0} \) on her own, using the structure of the circuit.
But things are getting different here, while some part of the \( f_0(p, w_1, w_2) \) can be evaluated by V since she has the knowledge of the circuit. \( (\widetilde{add_0}, \widetilde{mult_0}) \). The main computational burden in this verificational task is computing \( \widetilde{V_1}(w_1) \) and \( \widetilde{V_1}(w_2) \), which requires time \( poly(S) \) since it is related to the value of gates in layer \( 2, 3, ... d \).
So instead, in this protocol, P sends both these values \( r_{1,1} = \widetilde{V_1}(w_1) \) and \( r_{1,2} = \widetilde{V_1}(w_2) \) to V, and claim they are true. And then using the following interactive reduction protocol to "reduce" to a single claim used in the next layer:
So far, we reduced the task of proving that \( \widetilde{V_0}(z) = r_0 \) to the task of proving both \( \widetilde{V_1}(w_1) = r_{1,1} \) and \( \widetilde{V_1}(w_2) = r_{1,2} \).
However, recall our goal was to reduce the task of proving \( V_0(z_0) = r_0 \) to the task of proving a single equality of the form \( V_1(z_1) = r_1 \). What remains is to reduce the task of proving two equalities of the form \( \widetilde{V_1}(w_1) = r_{1,1} \) and \( \widetilde{V_1}(w_2) = r_{1,2} \) to the task of proving a single equality of the form \( \widetilde{V_1}(z_1) = r_1 \). This is done via the following (standard) process:
- V constructs line \( \gamma: F \to F^{s_i} \) with two values \( w_1 \) and \( w_2 \), such that \( \gamma(0) = w_1 \) and \( \gamma(1) = w_2 \). And then sends \( \gamma \) to P.
- P encodes the composition of \( \widetilde{V_1} \) and \( \gamma \) to a univariate polynomial \(f: F \to F \) such that (\( f(x) = \widetilde{V_1}(\gamma(x)) \)), and then sends to V.
- V checks that \( f(0) == \widetilde{V_1}(w_1) = r_{1,1} \) and \( f(1) == \widetilde{V_2}(w_2) = r_{1,2} \). If check pass, V chooses a random element \( r \) from \( F \) and computes \(f(r) \) produce a new claim that: \( f(r) \equiv \widetilde{V_1}(\gamma(r)) \)
- V then define \(l := \gamma(r) \in F^{S_0} \) and sends \((l, r) \) to P. Thus, in the next round of the sum-check protocol, P is left to prove a single claim:
\[ \widetilde{V_1}(l) = f(r) \]
Completeness of the protocol:
\( f \) "captures" the values of \( \widetilde{V_1} \) along the line \( \gamma \).
The random point \(r \) is chosen independently by V. If P trying to cheat, there is a very low probability P can create an invalid \(f \) that \( f(0) \equiv v_1 \) and \( f(1) \equiv v_2 \).
If the prover can prove \( f(r) = \widetilde{V_1}(l) \) in the next round, this strongly suggests that \( \widetilde{V_1}(w_1) = r_{1,1} \) and \( \widetilde{V_1}(w_2) = r_{1,2} \) were valid claims with high probability.
New \( z = \gamma(r) \). This new \( z \) is a point along the line \( \gamma \) connecting \( w_1 \) and \( w_2 \).
If \( \gamma \) is correctly formed, it is only determined by \( w_1 \), \( w_2 \), by using a random chosen \(r \), if \(z \) is correct, then \(w_1, w_2 \) is also correct.
New \( r_1 = f(r) \). The value is now treated as the new target value in the next layer of the protocol
Since \(f \) is defined as \(\widetilde{V_1} \) composed with \(r \), any inconsistency between \(\widetilde{V_i} \) and \(f \) would manifest at multiple points on \(r \). Thus if \( f(t_1) \) and \(f(t_2) \) are correct, and the degree fits, then \(f(r) \) is also correct at randomly chosen \(t \) with high probability.
- Sum-check Protocol at Layer \( d \) and the Final Check
In the iteration \( d \), which is very similar to previous phases. P wants to convince V that \( r_d = \widetilde{V_d}(z) \), and at the end of the protocol in this layer, P will sent \( s_d(z) \) which refer to the low-degree polynomial of the input. Since this layer is the input layer, V can verify on her own. This amounts to computing a single point in the low-degree extension of the input x.
If all the input matches, this means function \( s_d \)is correctly formed, thus \( \widetilde{V_d} \) is also correctly formed, and in the previous layer, \( \widetilde{V_{d-1}} \) is also valid, all w.h.p etc.
Thus according to the Soundness of the Sum-Check Protocol, especially Backward Reasoning, we can get w.h.p that \( V_0 \) is correctly formed which implies \( r_0 \) which is \( C(x) \) should should be correct w.h.p.