Verifiers¶
Verifiers in Ghostferry are designed to ensure that Ghostferry did not
corrupt/miss data. There are three different verifiers: the
ChecksumTableVerifier
, the InlineVerifier
, and the TargetVerifier
. A comparison of the
ChecksumTableVerifier
and InlineVerifier
are given below:
ChecksumTableVerifier | InlineVerifier | |
Mechanism | CHECKSUM TABLE |
Verify row after insert; Reverify changed rows before and during cutover. |
Impacts on Cutover Time | Linear w.r.t data size | Linear w.r.t. change rate [1] |
Impacts on Copy Time [2] | None | Linear w.r.t data size |
Memory Usage | Minimal | Linear w.r.t rows changed |
Partial table copy | Not supported | Supported |
Worst Case Scenario | Large databases causes unacceptable downtime | Verification is slower than the change rate of the DB |
[1] | Additional improvements could be made to reduce this as long as Ghostferry is faster than the rate of change. See https://github.com/Shopify/ghostferry/issues/13. |
[2] | Increase in copy time does not increase downtime. Downtime occurs only in cutover. |
If you want verification, you should try with the ChecksumTableVerifier
first if you’re copying whole tables at a time. If that takes too long, you can
try using the InlineVerifier
. Alternatively, you can verify in a staging
run and not verify during the production run (see Running ghostferry-copydb in production).
Note that the InlineVerifier
on its own may potentially miss some
cases, and using it with the TargetVerifier
is recommended if these
cases are possible.
Conditions | ChecksumTable | Inline | Inline + Target |
Data inconsistency due to Ghostferry issuing an incorrect UPDATE on the target database (example: encoding-type issues). | Yes [3] | Yes | Yes |
Data inconsistency due to Ghostferry failing to INSERT on the target database. | Yes | Yes | Yes |
Data inconsistency due to Ghostferry failing to DELETE on the target database. | Yes | Yes | Yes |
Data inconsistency due to rogue application issuing writes (INSERT/UPDATE/DELETE) against the target database. | Yes | Sometimes [4] | Yes |
Data inconsistency due to missing binlog events when Ghostferry is resumed from the wrong binlog coordinates. | Yes | Sometimes [5] | Sometimes [5] |
Data inconsistency if Ghostferry’s Binlog writing implementation is incorrect and modified the wrong row on the target (example, an UPDATE is supposed to go to id = 1 but Ghostferry instead issued a query for id = 2). This is an unrealistic scenario, but is included for illustrative purposes. | Yes | Probably not [6] | Probably not [6] |
[3] | Note that the CHECKSUM TABLE statement is broken in MySQL 5.7 for tables with JSON columns. These tables will result in a false positive event: even if two tables are identical, they can emit different checksums. See https://bugs.mysql.com/bug.php?id=87847. This applies to every row in this table. |
[4] | If the rows modified by the rogue application are modified again on the source after Ghostferry starts, the InlineVerifier’s binlog tailer should pick up that row and attempt to reverify it. |
[5] | (1, 2) If the rows missed after resume are modified again on the source after Ghostferry starts, the InlineVerifier’s binlog tailer should pick up that row and attempt to reverify it. |
[6] | (1, 2) If the implementation of the Ghostferry algorithm is so broken, chances are the InlineVerifier won’t catch it either as it relies on the same algorithm to enumerate the table and tail the binlogs. |
IterativeVerifier (Deprecated)¶
NOTE! This is a deprecated verifier. Use the InlineVerifier instead.
IterativeVerifier verifies the source and target in a couple of steps:
- After the data copy, it first compares the hashes of each applicable rows
of the source and the target together to make sure they are the same. This
is known as the initial verification.
- If they are the same: the verification for that row is complete.
- If they are not the same: add it into a reverify queue.
- For any rows changed during the initial verification process, add it into the reverify queue.
- After the initial verification, verify the rows’ hashes in the reverification queue again. This is done to reduce the time needed to reverify during the cutover as we assume the reverification queue will become smaller during this process.
- During the cutover stage, verify all rows’ hashes in the reverify queue.
- If they are the same: the verification for that row is complete.
- If they are not the same: the verification fails.
- If no verification failure occurs, the source and the target are identical. If verification failure does occur (4b), then the source and target are not identical.
A proof of concept TLA+ verification of this algorithm is done in https://github.com/Shopify/ghostferry/tree/iterative-verifier-tla.
InlineVerifier¶
InlineVerifier verifies the source and target inline with the other components with a few slight differences from the IterativeVerifier above. The primary difference being that this verification process happens while the data is being copied by the DataIterator instead of after the fact.
With regards to the DataIterator
and BatchWriter
:
While selecting the data in the
DataIterator
, a fingerprint is appended to the end of the statement thatSELECT
s data from the source asSELECT *, MD5(...) FROM ...
The fingerprint, gathered from the
MD5(...)
of the query above is stored on theRowBatch
to be used in the next verification step.The
BatchWriter
then attempts to write theRowBatch
, but instead of inserting it directly, the following process is taken:- A transaction is opened.
- The data contained in the
RowBatch
is inserted. - The PK and fingerprint is then
SELECT
ed from the Target asSELECT pk, MD5(....) FROM ...
. - The fingerprint (
MD5
) is then checked against the fingerprint currently stored on theRowBatch
.
The process in step 3 above is retried (with a limit) if there happens to be a failure or mismatch, and will fail the run if they are not verified within the retry limits.
With regards to the BinlogStreamer:
- As DMLs are observed by the
BinlogStreamer
, the PKs of the events are placed into areverifyStore
to be periodically verified for correctness. - This continues to happen in the background throughout the process of the Run.
- If a PK is found not to match, it is added back into the reverifyStore to be verified again.
- When
VerifyBeforeCutover
starts, the InlineVerifier will verify enough of the events in thereverifyStore
to ensure it has a sufficiently small number of events that can be successfully verified before cutover. - When
VerifyDuringCutover
begins, all of the remaining events in thereverifyStore
are verified and any mismatches are returned.
TargetVerifier¶
TargetVerifier ensures data on the Target is not corrupted during the move process and is meant to be used in conjunction with another verifier above.
It uses a configurable annotation string that is prepended to DMLs that acts as a verified “signature” of all of Ghostferry’s operations on the Target:
- A BinlogStreamer is created and attached to the Target
- As this BinlogStreamer receives DML events, it attempts to extract the annotation
from each for each of the
RowsEvents
.
3. If an annotation is not found for the DML, or the extracted annotation does not match the configured annotation of Ghostferry, an error is returned and the process fails.
The TargetVerifier needs to be manually stopped before cutover. If it is not stopped,
it may detect writes from the application (that are not from Ghostferry) and fail the run.
Stopping before cutover also gives the TargetVerifier the opportunity to inspect all
of the DMLs in its BinlogStreamer
queue to ensure no corruption of the data has occurred.