RAIB

Aus OrgaMon Wiki
Zur Navigation springen Zur Suche springen

Redundant Array of Independent dataBases

The goal of RAIB (for firebird) is to ensure the availability of a firebird database - even if hardware fails. This is done by fail-over servers (spares) or Log-Packets (.flog File) on a remote Filesystem. This is just a concept - BUT implementing this ideas firebird will be a unbreakable like the internet.
Remember that malfunction of a single transistor in your high end server will bring firebird down. And modern servers have Thousands and thousands of it.

How it will work

The work is done by 3 (new?) mechanism, not yet implemented in firebird:

Heart-Beat-Sequence: a BIGINT living inside the database, counts the impacts to the database since "CREATE DATABASE". This will help naming Statement-Log-Packets
Statement-Log-Packets: Log DDL/DML Statements in a way so they can build basis of a Statement - "playback".
snap-shot-backups: a Backup where the impact AFTER the Backups was started are invisible to the Backup.
Statement-Playback: a system where server can execute "Statement-Packets". Playback-Mode: gbak sets a database in this mode when doing statement Playback to a database.

They communicate with a "master" or "primary server" getting all the irformation to hold their copy of the database in sync with the master server. If the master fails, a spare can be used to do further transactions. This is NOT done by Replication-Technologies, but by Statement-Log-Packets for each "Write" Operation to the Database.


(implement a) Heart-Beat-Ticker

A Database has a internal GENERATOR (the Heart-Beat-Ticker = HBT [BIGINT]) that is incremented after each statement commit doing some change to data or metadata. A read-Only Statement should not incrememt the HBT.

(add a ) RDB$Cluster Table

inside the database is the information about the spares and the master.

IP               Role       last-HBT
192.168.115.192  master
192.168.115.196  spare
192.168.115.80   spare
30.23.12.3       spare

planned switiching the role from master to spare

  1. the master do not accept W Statements any more. (Try again in 2 Minutes!)
  2. the master ensures that one spare is ready to take over (master.HTB=spare.HTB)
  3. the master switch to "spare" Mode
  4. the spare switch to "master" Mode

Database in master mode

Database in spare mode

  • the LOG-Receive Thread

the master initiates communication to a spare and post LOG Packets to the spare. Alle Packets are flushed into memory - if CRC is ok, the spare sends a ACK to the master. The master do NOT wait until the spare executes the statement because the master already prooved that the statement is executable.

  • the Worker-Thread

read the LOG-Packets, and execute it against the own Database.

  • LOG-missed

if the spare was offline a time, she sees that local.HBT <> LOG.HBT+1. Before it can execute actual LOG-Packets she MUST ask the master (or other spares) for the missing LOG Packets. After doing the old stuff it can start to do execute the actual receiced LOG-Packtes. While this time, the log-Receive thread isnt stopping filling the LOG-Buffer.

New Database "Shutdown" Modes

work normal but refuse "W" Statement with a special error message. They may redirekted to a master.


Statement LOG

A SQL-Statement like "DROP TABLE HUGEONE" may purge thousands of Pages of the Database File, and modify 1 or 2. So the Log-Packet itself schould NOT contain all the changed Database Pages or such low-level Details but the SQL-Statement itself. We can trust on the fact that the spares have all the knowledge to understand the Statement and execute it. This information is transmitted to one or more spares. This information is also stored in the Database itself for a time (not part of a backup). At least until its prooven that a n.fabk is available. After the Log-DataBlock must succsessfully

  1. Do the Statement, detect while doing it, if it is harmfull to the Database (W oder R)
  2. if W: compile a log-block with the name GEN_ID(HBT,1); if R: break
  3. Send the LOG over the line.


Client: DROP TABLE CLUB$2873 Server: W,928372

Client: SELECT CURRENT_TIMESTAMP from RDB$DATABASE Server: R

Content of a LOG-Packet

  • the Statement
  • the Server context: used 'NOW'- and 'RND'- Values.
  • the HBT-Tick after the statement is executed.

detecting the failure

==

Positive Side Effekts

if a client assumes that a special statement is read only AND HBT of a spare is the HBT of the master she can post this statement to a spare to take load away from the master.

and the Statement is Read-Only it can be used as a 


Communication master<->spare

if a spare comes up (on connection request of a master), a spare can ask the master how

Good to have several spares

it enough fore the master to have one responding spare. If a connection to a spare fails it it retried every 20 Seconds.

Spare - Modes

local spare

the master writes the LOG-Packets (FileName <HBT>.flog) to a (remote) filesystem.

cold spare

a firebird-server receives the log-Packets but do not execute them. The worker-Thread writes the .flog Files to the filesystem if the system is idle.

hot spare

a firebird-server receives and execute the log-Packets a fast as possible.


Restore Szenario

  1. restore the database from a .fbak
  2. read the value of GEN_HBT
  3. let the server execute all *.flog Files beginning from GEN_HBT+1

Client Protcol Extensions

the client must understand a new error-Msg when trying to place a statement to a spare that was a master before. the client must interrpret the new connection string and (re)prepare a open statements.

  1. Error-Msg with a Retry-Option: "Im a Spare, please retry with [%server:%database]"