
Verifiers in Ghostferry are designed to ensure that Ghostferry did not corrupt/miss data. There are three different verifiers: the ChecksumTableVerifier, the InlineVerifier, and the TargetVerifier. A comparison of the ChecksumTableVerifier and InlineVerifier are given below:





Verify row after insert; Reverify changed rows before and during cutover.

Impacts on Cutover Time

Linear w.r.t data size

Linear w.r.t. change rate [1]

Impacts on Copy Time [2]


Linear w.r.t data size

Memory Usage


Linear w.r.t rows changed

Partial table copy

Not supported


Worst Case Scenario

Large databases causes unacceptable downtime

Verification is slower than the change rate of the DB

If you want verification, you should try with the ChecksumTableVerifier first if you’re copying whole tables at a time. If that takes too long, you can try using the InlineVerifier. Alternatively, you can verify in a staging run and not verify during the production run (see Running ghostferry-copydb in production).

Note that the InlineVerifier on its own may potentially miss some cases, and using it with the TargetVerifier is recommended if these cases are possible.




Inline + Target

Data inconsistency due to Ghostferry issuing an incorrect UPDATE on the target database (example: encoding-type issues).

Yes [3]



Data inconsistency due to Ghostferry failing to INSERT on the target database.




Data inconsistency due to Ghostferry failing to DELETE on the target database.




Data inconsistency due to rogue application issuing writes (INSERT/UPDATE/DELETE) against the target database.


Sometimes [4]


Data inconsistency due to missing binlog events when Ghostferry is resumed from the wrong binlog coordinates.


Sometimes [5]

Sometimes [5]

Data inconsistency if Ghostferry’s Binlog writing implementation is incorrect and modified the wrong row on the target (example, an UPDATE is supposed to go to id = 1 but Ghostferry instead issued a query for id = 2). This is an unrealistic scenario, but is included for illustrative purposes.


Probably not [6]

Probably not [6]

IterativeVerifier (Deprecated)

NOTE! This is a deprecated verifier. Use the InlineVerifier instead.

IterativeVerifier verifies the source and target in a couple of steps:

  1. After the data copy, it first compares the hashes of each applicable rows of the source and the target together to make sure they are the same. This is known as the initial verification.

    1. If they are the same: the verification for that row is complete.

    2. If they are not the same: add it into a reverify queue.

  2. For any rows changed during the initial verification process, add it into the reverify queue.

  3. After the initial verification, verify the rows’ hashes in the reverification queue again. This is done to reduce the time needed to reverify during the cutover as we assume the reverification queue will become smaller during this process.

  4. During the cutover stage, verify all rows’ hashes in the reverify queue.

    1. If they are the same: the verification for that row is complete.

    2. If they are not the same: the verification fails.

  5. If no verification failure occurs, the source and the target are identical. If verification failure does occur (4b), then the source and target are not identical.

A proof of concept TLA+ verification of this algorithm is done in


InlineVerifier verifies the source and target inline with the other components with a few slight differences from the IterativeVerifier above. The primary difference being that this verification process happens while the data is being copied by the DataIterator instead of after the fact.

With regards to the DataIterator and BatchWriter:

  1. While selecting the data in the DataIterator, a fingerprint is appended to the end of the statement that SELECT s data from the source as SELECT *, MD5(...) FROM ...

  2. The fingerprint, gathered from the MD5(...) of the query above is stored on the RowBatch to be used in the next verification step.

  3. The BatchWriter then attempts to write the RowBatch, but instead of inserting it directly, the following process is taken:

    1. A transaction is opened.

    2. The data contained in the RowBatch is inserted.

    3. The PK and fingerprint is then SELECT ed from the Target as SELECT pk, MD5(....) FROM ....

    4. The fingerprint (MD5) is then checked against the fingerprint currently stored on the RowBatch.

    The process in step 3 above is retried (with a limit) if there happens to be a failure or mismatch, and will fail the run if they are not verified within the retry limits.

With regards to the BinlogStreamer:

  1. As DMLs are observed by the BinlogStreamer, the PKs of the events are placed into a reverifyStore to be periodically verified for correctness.

  2. This continues to happen in the background throughout the process of the Run.

  3. If a PK is found not to match, it is added back into the reverifyStore to be verified again.

  4. When VerifyBeforeCutover starts, the InlineVerifier will verify enough of the events in the reverifyStore to ensure it has a sufficiently small number of events that can be successfully verified before cutover.

  5. When VerifyDuringCutover begins, all of the remaining events in the reverifyStore are verified and any mismatches are returned.


TargetVerifier ensures data on the Target is not corrupted during the move process and is meant to be used in conjunction with another verifier above.

It uses a configurable annotation string that is prepended to DMLs that acts as a verified “signature” of all of Ghostferry’s operations on the Target:

  1. A BinlogStreamer is created and attached to the Target

  2. As this BinlogStreamer receives DML events, it attempts to extract the annotation from each for each of the RowsEvents.

3. If an annotation is not found for the DML, or the extracted annotation does not match the configured annotation of Ghostferry, an error is returned and the process fails.

The TargetVerifier needs to be manually stopped before cutover. If it is not stopped, it may detect writes from the application (that are not from Ghostferry) and fail the run. Stopping before cutover also gives the TargetVerifier the opportunity to inspect all of the DMLs in its BinlogStreamer queue to ensure no corruption of the data has occurred.