RAIB: Unterschied zwischen den Versionen
Keine Bearbeitungszusammenfassung |
|||
Zeile 3: | Zeile 3: | ||
</h1> | </h1> | ||
The goal of RAIB (for firebird) is to ensure the availability of a firebird database - even if hardware fails. This is done by fail-over servers (spares) or Log-Packets (.flog File) on a remote Filesystem. In other Environments you loose all Data from the Moment of your last Backup | The goal of RAIB (for firebird 3.0) is to ensure the availability of a firebird database - even if hardware fails. This is done by fail-over servers (spares) or Log-Packets (.flog File) on a remote Filesystem. In other Environments (without RAIB) you loose all Data from the Moment of your last Backup until the Moment of faliure. With RAIB you are able to get your Database up to the state where it was. This is just a concept - BUT implementing this ideas will make firebird 3.0 (with the vulcan engine) unbreakable.<br> | ||
Remember that malfunction of a single transistor in your high-end-server will bring firebird down (And modern servers have Thousands and thousands of it). Maybe your HD is OK, or you trust on a RAID 5 Storage. But the Question is: How long does it take to get the firebird-service online again! <br> | Remember that malfunction of a single transistor in your high-end-server will bring firebird down (And modern servers have Thousands and thousands of it). Maybe your HD is OK, or you trust on a RAID 5 Storage. But the Question is: How long does it take to get the firebird-service online again! You can immediatly connect your application to a spare if you reduce to "read only" SQL-Statements, a admin can switch the spare to the master mode, so all the functionality of the firebird database can be used.<br> | ||
<br> | <br> | ||
== How it will work == | == How it will work == | ||
The work is done by | The work is done by 5 (new?) mechanism, not yet implemented in firebird by now: | ||
'''Heart-Beat-Sequence''': a BIGINT living inside the database, counts the impacts to the database since the Statement "CREATE DATABASE". This will help naming Statement-Log-Packets<br> | # '''Heart-Beat-Sequence''': a BIGINT living inside the database, counts the impacts to the database since the Statement "CREATE DATABASE". This will help naming Statement-Log-Packets<br> | ||
'''Statement-Log-Packets''': Logs DDL/DML Statements in a way so they can build basis of a Statement - "playback".<br> | # '''Statement-Log-Packets''': Logs DDL/DML Statements in a way so they can build basis of a Statement - "playback".<br> | ||
'''snap-shot-backups''': a Backup-Mode where all the impact AFTER the Backups was started are invisible to the Backup.<br> | # '''snap-shot-backups''': a Backup-Mode where all the impact AFTER the Backups was started are invisible to the Backup.<br> | ||
'''Statement-Playback''': a system where server can execute "Statement-Packets". | # '''Statement-Playback''': a system where server can execute "Statement-Packets". | ||
'''Playback-Mode''': gbak sets a database in this mode when doing statement Playback to a database. | # '''Playback-Mode''': gbak sets a database in this mode when doing statement Playback to a database. | ||
''' | ''' | ||
Version vom 1. Juni 2007, 14:54 Uhr
Redundant Array of Independent dataBases
The goal of RAIB (for firebird 3.0) is to ensure the availability of a firebird database - even if hardware fails. This is done by fail-over servers (spares) or Log-Packets (.flog File) on a remote Filesystem. In other Environments (without RAIB) you loose all Data from the Moment of your last Backup until the Moment of faliure. With RAIB you are able to get your Database up to the state where it was. This is just a concept - BUT implementing this ideas will make firebird 3.0 (with the vulcan engine) unbreakable.
Remember that malfunction of a single transistor in your high-end-server will bring firebird down (And modern servers have Thousands and thousands of it). Maybe your HD is OK, or you trust on a RAID 5 Storage. But the Question is: How long does it take to get the firebird-service online again! You can immediatly connect your application to a spare if you reduce to "read only" SQL-Statements, a admin can switch the spare to the master mode, so all the functionality of the firebird database can be used.
How it will work
The work is done by 5 (new?) mechanism, not yet implemented in firebird by now:
- Heart-Beat-Sequence: a BIGINT living inside the database, counts the impacts to the database since the Statement "CREATE DATABASE". This will help naming Statement-Log-Packets
- Statement-Log-Packets: Logs DDL/DML Statements in a way so they can build basis of a Statement - "playback".
- snap-shot-backups: a Backup-Mode where all the impact AFTER the Backups was started are invisible to the Backup.
- Statement-Playback: a system where server can execute "Statement-Packets".
- Playback-Mode: gbak sets a database in this mode when doing statement Playback to a database.
They communicate with a "master" or "primary server" getting all the irformation to hold their copy of the database in sync with the master server. If the master fails, a spare can be used to do further transactions. This is NOT done by Replication-Technologies, but by Statement-Log-Packets for each "Write" Operation to the Database.
Heart-Beat-Sequence
SEQ_HEART_BEAT
A Database has an internal GENERATOR (the Heart-Beat-Sequence = HBS [BIGINT]) that is incremented after each statement commit doing some change to data or metadata. A read-Only Statement should not incrememt the HBT.
R/W Detector
Each Request to the Database has a R/W-Flag that is set on its way trough the Database. Is the engine detects a change to Data or Metadata it sets the "Write"-Flag, this statement was prooven now to have a impact to the database, it is a "W"-Statement and must at its End increment the SEQ_HEART_BEAT and a Statement-Log is written.
(add a ) RDB$Cluster Table
inside the database is the information about the spares and the master.
IP Role last-HBT 192.168.115.192 master 192.168.115.196 spare 192.168.115.80 spare 30.23.12.3 spare
planned switiching the role from master to spare
- the master do not accept W Statements any more. (Try again in 2 Minutes!)
- the master ensures that one spare is ready to take over (master.HTB=spare.HTB)
- the master switch to "spare" Mode
- the spare switch to "master" Mode
Database in master mode
no limitations. He must bring the Log-Packets to the world:
- By UDP to the spares.
- By fopen to the filesystem.
- By move to the memory.
Database in spare mode
- the LOG-Receive Thread
the master initiates communication to a spare and post LOG Packets to the spare. Alle Packets are flushed into memory - if CRC is ok, the spare sends a ACK to the master. The master do NOT wait until the spare executes the statement because the master already prooved that the statement is executable.
- the Worker-Thread
read the LOG-Packets, and execute it against the own Database.
- LOG-missed
if the spare was offline a time, she sees that local.HBT <> LOG.HBT+1. Before it can execute actual LOG-Packets she MUST ask the master (or other spares) for the missing LOG Packets. After doing the old stuff it can start to do execute the actual receiced LOG-Packtes. While this time, the log-Receive thread isnt stopping filling the LOG-Buffer.
New Database "Shutdown" Modes
work normal but refuse "W" Statement with a special error message. They may redirekted to a master.
Statement LOG
A SQL-Statement like "DROP TABLE HUGEONE" may purge thousands of Pages of the Database File, and modify 1 or 2. The Log-Packet itself schould NOT contain all the changed Database Pages or such low-level Details but the SQL-Statement itself. We can trust on the fact that the spares have all the knowledge to understand the Statement and execute it exactly like the master. This information is transmitted to one or more spares. This information is also stored in the Database itself for a time (not part of a backup). At least until its prooven that a n.fabk is available. After the Log-DataBlock must succsessfully
- Do the Statement, detect while doing it, if it is harmfull to the Database (W oder R)
- if W: compile a log-block with the name GEN_ID(HBT,1); if R: break
- Send the LOG over the line.
Client: DROP TABLE CLUB$2873
Server: W,928372
Client: SELECT CURRENT_TIMESTAMP from RDB$DATABASE Server: R
Generating a LOG-Packet
The Log-Packet is assembled by the engine right by parsing and executing the request. The LOG-Paket is a memory block that holds all information for the playback. If a "W" Statement reads CURRENT_TIMESTAMP (Timestamp) or RND (Randomize) a copy of the returned value MUST be stored in the LOG, because the spares can not trust on own values. If a Request is fully executed, and the engine didn't set the "W" Flag - the LOG-Packet is purged. Else the Packet is populatetd to the spares. The server itself hold a persistent copy of the Log-Packet to help spare get up to date if they missed something. Only a successfull backup can delete the Log-Packets that are generated before the .fbak HBS.
Charakter of the LOG-Packet
Statement
UPDATE PERSON SET MOMENT = CURRENT_TIMESTAMP WHERE LOAN = RND(2920)
Log-Paket
P:RND=2093 P:CURRENT_TIMESTAMP P:CURRENT_TIMESTAMP P:CURRENT_TIMESTAMP S:UPDATE PERSON ...
detecting the failure
- Master is Not Responding
- Spare say they have problems executing crazy Packets.
Positive Side Effekts
if a client assumes that a special statement is "R" AND HBT of a spare is the HBT of the master she can post this statement to a spare to take load away from the master. The spare will bring the same result as the master.
Communication master<->spare
if a spare comes up (on connection request of a master), a spare can ask the master how
Good to have several spares
it is enough fore the master to have only one responding spare. If a connection to a spare fails the master retries it every 20 Seconds.
New "Shut"-Down Modes
be a local spare
the master writes the LOG-Packets (FileName <HBT>.flog) to a (remote) filesystem.
be a cold spare
a firebird-server receives the log-Packets but do not execute them. The worker-Thread writes the .flog Files to the filesystem if the system is idle.
be a hot spare
a firebird-server receives and execute the log-Packets a fast as possible.
be a master
Restore Szenario
- restore the database from a .fbak
- read the value of GEN_HBT
- let the server execute all *.flog Files beginning from GEN_HBT+1
New ERROR Messages
"W"-Statement not allowed on a spare
the client must understand a new error-Msg when trying to place a statement to a spare that was a master before. the client must interrpret the new connection string and (re)prepare a open statements.
- Error-Msg with a Retry-Option: "Im a Spare, please retry with [%server:%database]"
Failed to Log your Statement
if the master can not log the statement to any spare this error MSG is generated.