RAIB: Unterschied zwischen den Versionen
Keine Bearbeitungszusammenfassung |
(→granid) |
||
(70 dazwischenliegende Versionen von 2 Benutzern werden nicht angezeigt) | |||
Zeile 1: | Zeile 1: | ||
(c) '2007-2008 by Andreas Filsinger | |||
<h1> | |||
<b><u>R</u></b>edundant <b><u>A</u></b>rray of <b><u>I</u></b>ndependent data<b><u>B</u></b>ases | |||
</h1> | |||
The goal of RAIB (for firebird 3.0?!) is to ensure the availability of a firebird database - even if hardware fails. This is done by fail-over servers (spares) or Log-Packets (.flog File) on a remote Filesystem. In other Environments (without RAIB) you loose all Data from the Moment of your last Backup until the Moment of faliure. With RAIB you are able to get your Database up to the state where it was. This is just a concept - BUT implementing this ideas will make firebird 3.0 (with the vulcan engine) unbreakable.<br> | |||
Remember that malfunction of a single transistor in your high-end-server will bring firebird down (And modern servers have millions of it). Maybe your HD is OK, or you trust on a RAID 5 Storage. But the Question is: How long does it take to get the firebird-service online again!<br> | |||
With RAIB you can immediatly connect your application to a spare-Host after the failure. In this Moment you are restricted to "read-only" if you reduce to "read only" SQL-Statements, a admin can switch one spare to the master mode, so all the functionality of the firebird database can be used.<br> | |||
<br> | |||
== How it will work == | |||
The work is done by 8 (new?) techniques, not yet implemented in firebird: | |||
# '''Statement-Classification''': by evaluating and executing a statement it is classified as a "R"- "W"- or "I"-Statement.<br> | |||
# '''Heart-Beat-Sequence''': a BIGINT living inside the database, counts the impacts to the database since the Statement "CREATE DATABASE". This will help naming Statement-Log-Packets<br> | |||
# '''Statement-Order''': A single Instance that decides in wich order the incoming Statements are executet. Inside a time window, this instance is free to do this ordering, but the order MUST be logged. | |||
# '''Statement-Packets-Logger''': Logs DDL/DML "W"-Statements in a way so they can build basis of a Statement - "playback".<br> | |||
# '''snap-shot-backups''': a Backup-Mode where all the impact AFTER the Backups was started are invisible to the Backup.<br> | |||
# '''Statement-Playback''': a system where server can execute "Statement-Packets". | |||
# '''Playback-Mode''': gbak sets a database in this mode when doing statement Playback to a database. | |||
# '''Deterministic SQL''': all Statements MUST end in computable results, so they can be executed in a server-swarm | |||
== | == Heart-Beat-Sequence == | ||
SEQ_HEART_BEAT | |||
A Database has an internal SEQUENCE (GENERATOR) (the Heart-Beat-Sequence = HBS [BIGINT]) that is incremented after each W-Statement commit doing some change to data or metadata. A read-Only Statement (I/R) should not incrememt the HBT. | |||
== Serialize == | |||
The master must have a mechanism to force quasi parallel W-Statements in an serialized order. The assigned HBS Value will make clear in wich order the statements are executed. Long time conspumting statements are not a problem because a W-Statement with HBS = n, only sees the Database in the Snap-Shop State n-1. | |||
== Statement Classification == | |||
firebird executes SQL-Code like "DROP TABLE SAMPLE" or Service Requests like "BACKUP DATABASE". If we want something like a spare-server we should filter the requests that really do some change to the Database. The three classes are "W"rite / "R"ead and "I"nformal. | |||
'''"W" Requests''' | |||
'''"R" Requests''' | |||
'''"I" Requests''' | |||
'''Samples''' | |||
<br> | |||
Client-Request: DROP TABLE CLUB$2873<br> | |||
Server-Classification: "W" / SEQ_HEART_BEAT=928372<br> | |||
<br> | |||
Client-Request: SELECT CURRENT_TIMESTAMP from RDB$DATABASE<br> | |||
Server-Classification: "I"<br> | |||
<br> | |||
Client-Request: SELECT COUNT(RID) from HUGETABLE<br> | |||
Server-Classification: R<br> | |||
<br> | |||
== Deterministic SQL == | |||
a quest to impact that a Statement do to the Database, that it will have the same result independant when where and why executed. | |||
Forbidden are Sideeffekts like "CURRENT_TIMESTAMP", "RAND()", results of USER Tabels or PERFORMANCE Tables or Logik that depends on Bits that are Outside the Database itself. | |||
Why do we need this: | |||
# Because of relyable playbacks. | |||
# Because of an exetuting swarm, resulting in a 100% identical database | |||
== GID == | |||
=== as a "changed" marker === | |||
The GID is a global int64 Sequence (Generator) and a "implizit" Field in every DB-Table wich mark or tag Records, Tables, Views and the Database itself. By now there are no arguments for having a GID Granularity set to Field-Level. | |||
* Every Time a "w" statement was completed, the Database GID gets incremented. You can read the actual GID by | |||
select GID from RDB$DATABASE | |||
* Also this new GID Value is store to the affecte TABLE(s). You can read this. | |||
select GID from TABLE1 | |||
// seems the same like | |||
select MAX(GID) from TABLE1 | |||
* Every Record holds its GID | |||
// get the actual GID of a record | |||
select GID from TABLE1 where RID=2029382 | |||
// get all the record, sorted by modification order, newest first | |||
select * from TABLE1 order by GID descending, RID ascending | |||
=== as the only valid Reference === | |||
STOP! Do not mix up RID and GID sequence. Think about a global RID pool! | |||
New inserted Records get their (first) RID from the GID | |||
== Update a Record: if unchanged == | |||
sometimes Applications or users need long time to "change" a record. At "first" the Applications read all Record-Data from Db-Table. Then after 5 Minutes they post their changed Record-Fields by a update statement. | |||
If the Data-Mask is complex, than even more TABLES may be affected by the update. If the application not only read the REcord-Data but also read the GID, it can post an update "on the knowledge state of 8383892". If the knowledge-Sate has gone further, the update will fail - the user may be informed that the unlying data has changed in the meantime and the update is not possible by know. | |||
The App may present a diff of the "local" and new "remote" Record - let the user decide how the merge could look like. | |||
update TABLE1 set | |||
FIELD1 = NewValue | |||
where | |||
(RID=828273) and | |||
(GID=8383892) | |||
if "affected rows" = 0 RID is gone or GID is changed. | |||
== Statement-Log-Packets == | |||
A SQL-Statement like "DROP TABLE HUGEONE" may purge thousands of Pages of the Database File, and modify 1 or 2. The Log-Packet itself schould NOT contain all the changed Database Pages or such low-level Details but the SQL-Statement itself. We can trust on the fact that the spares have all the knowledge to understand the Statement and execute it exactly like the master. | |||
This information is transmitted to one or more spares. This information is also stored in the Database itself for a time (not part of a backup). At least until its prooven that a n.fabk is available. | |||
After the Log-DataBlock must succsessfully | |||
# Do the Statement, detect while doing it, if it is harmfull to the Database (W oder R) | |||
# if W: compile a log-block with the name GEN_ID(HBT,1); if R: break | |||
# Send the LOG over the line. | |||
'''Generating a LOG-Packet''' | |||
'''"R"-Statements /"W"-Statements''' | |||
Each Request, on its way through the Database holds a Flag, saying "I am Read/Only" or "I did some impact" to the Database. This Flag (the R/W-Flag) is set by the engine in the actual context, because you can formulate a statement wich is 99% a "R"-one, but there comes a day, this statement do a change to the Database. | |||
When the engine detects a change to Data or Metadata it sets the "Write"-Flag, this statement was prooven now to have a impact to the database, it is a "W"-Statement and must at its End increment the SEQ_HEART_BEAT and a Statement-Log is written. | |||
The Log-Packet is assembled by the engine step by step right in the moment by parsing and executing it. The LOG-Packet is the "compiled" version of the Statement. The LOG-Paket is written to a memory block. If a "W" Statement reads CURRENT_TIMESTAMP (Timestamp) or RND (Randomize) a copy of the returned value MUST be stored in the LOG, because the spares can not trust on own values. | |||
If a Request is fully executed, and the engine didn't set the "W"-Flag - the LOG-Packet is purged, because it did no harm to the database it can say us nothing about history. Else (a "W"-Statement) the Packet is populatetd to the spares. The server itself hold a persistent copy of the Log-Packet to help spare get up to date if they missed something. | |||
Only a successfull backup can delete the Log-Packets that are generated before the .fbak HBS. | |||
'''Quasi "W" Statements''' | |||
select CURRENT_TIMESTAMP from RDB$DATABASE | |||
is a "W"-Statement that can only be executed by the master, because the client ask for information that is NOT part of the Database itself. So all external Information going throgh the engine are Quasi-"W" (user account infos, or coming user SQL Statements). They MUST be executed by the master, but do NOT goto the Log. | |||
Statement | |||
UPDATE PERSON SET MOMENT = CURRENT_TIMESTAMP WHERE LOAN = RND(2920) | |||
Log-Paket | |||
P:RND=2093 | |||
P:CURRENT_TIMESTAMP | |||
P:CURRENT_TIMESTAMP | |||
P:CURRENT_TIMESTAMP | |||
S:UPDATE PERSON ... | |||
== | == Snap-Shot-Backups == | ||
== Statement-Playback == | |||
'''Database in spare mode''' | |||
* the LOG-Receive Thread | * the LOG-Receive Thread | ||
Zeile 42: | Zeile 165: | ||
if the spare was offline a time, she sees that local.HBT <> LOG.HBT+1. Before it can execute actual LOG-Packets she MUST ask the master (or other spares) for the missing LOG Packets. After doing the old stuff it can start to do execute the actual receiced LOG-Packtes. While this time, the log-Receive thread isnt stopping filling the LOG-Buffer. | if the spare was offline a time, she sees that local.HBT <> LOG.HBT+1. Before it can execute actual LOG-Packets she MUST ask the master (or other spares) for the missing LOG Packets. After doing the old stuff it can start to do execute the actual receiced LOG-Packtes. While this time, the log-Receive thread isnt stopping filling the LOG-Buffer. | ||
== Playback-Mode == | |||
The Playback-Mode (or Spare Mode) of a database is a new type of Database "Shutdown" Mode. Connections to this host in Spare Mode see no difference to a normal Database. | |||
work normal but refuse "W" Statement with a special error message. They may redirekted to a master. | work normal but refuse "W" Statement with a special error message. They may redirekted to a master. | ||
'''be a local spare''' | |||
the master writes the LOG-Packets (FileName <HBT>.flog) to a (remote) filesystem. | |||
'''be a cold spare''' | |||
a firebird-server receives the log-Packets but do not execute them. The worker-Thread writes the .flog Files to the filesystem if the system is idle. | |||
'''be a hot spare''' | |||
a firebird-server receives and execute the log-Packets a fast as possible. | |||
'''be a master''' | |||
'''planned switiching the role from master to spare''' | |||
# the master do not accept W Statements any more. (Try again in 2 Minutes!) | |||
# the master ensures that one spare is ready to take over (master.HTB=spare.HTB) | |||
# the master switch to "spare" Mode | |||
# the spare switch to "master" Mode | |||
'''Database in master mode''' | |||
no limitations. He must bring the Log-Packets to the world: | |||
* By UDP to the spares. | |||
* By fopen to the filesystem. | |||
* By move to the memory. | |||
== | == (add a ) RDB$Cluster Table == | ||
inside the database is the information about the spares and the master. | |||
IP Role last-HBT | |||
192.168.115.192 master | |||
192.168.115.196 spare | |||
192.168.115.80 spare | |||
30.23.12.3 spare | |||
== | == detecting the failure == | ||
* Master is Not Responding | |||
* Spare say they have problems executing crazy Packets. | |||
== Positive Side Effekts == | |||
if a client assumes that a special statement is "R" AND HBT of a spare is the HBT of the master she can post this statement to a spare to take load away from the master. The spare will bring the same result as the master. | |||
== Communication master<->spare == | == Communication master<->spare == | ||
Zeile 89: | Zeile 229: | ||
== Good to have several spares == | == Good to have several spares == | ||
it enough fore the master to have one responding spare. If a connection to a spare fails it | it is enough fore the master to have only one responding spare. If a connection to a spare fails the master retries it every 20 Seconds. | ||
== Restore Szenario == | |||
# restore the database from a .fbak | |||
# read the value of GEN_HBT | |||
# let the server execute all *.flog Files beginning from GEN_HBT+1 | |||
== New ERROR Messages == | |||
'''"W"-Statement not allowed on a spare''' | |||
the client must understand a new error-Msg when trying to place a statement to a spare that was a master before. the client must interrpret the new connection string and (re)prepare a open statements. | |||
# Error-Msg with a Retry-Option: "Im a Spare, please retry with [%server:%database]" | |||
'''Failed to Log your Statement''' | |||
if the master can not log the statement to any spare this error MSG is generated. | |||
== the lying master concept == | |||
Imaging a host wich is a spare (in truth) and tells the world that she is a Master. This is a spare wich executes all the "R" Statement, but automatically routes "W"-Statements to a master without telling the connection something about it. The result is routed back to the connection AFTER she has executed the Log-Packet, so that this spare is "true" wich its single connection. There may be a "gab" of Log-Packets (more to come here). <br> | |||
You may have a Grid of "lying masters" in a webshop scenario. Alle the "W" Work is done by one hidden master, not slowed down by anoing "R" Statements of any connection. | |||
== the lazy worker == | |||
* say "Im done" if all checks are done that ensure "T" will succseed, but dont do the work! | |||
== the Cash Assoziator == | |||
* Cashing: is a technique that repeats a saved answer to an known question without "understanding" the question again. This is Quick! | |||
* Trick 1: Always use cash if since last "R" there was no "T" | |||
* Trick 2: On first insert in cash, build a dependecy-tree, on what the Answers depends on, if asked again, only check tree, if not more developed than the saved version -> use cashed answer | |||
* Hint: While executing "W" Statement cut Cache-trees | |||
* Trick 3: Cash-Questions are parallel to Answers, we have an unlimited Count of CPUs. | |||
== the Context Bubble == | |||
* repesents the Dependecy Tree | |||
A cash Element contains the "cas-Azzoziator-ID" the "Answer" the "Context-Bubble". In "idle" Time the Cash is cleared off all the Entries wich Context-Bubbles blobs. | |||
A "R" Statement can be cashed if "Context" Bubble still is alive | |||
== the new I/O Model: IO+C (Input Output Cacheable) == | |||
* extends the classic I/O Modell by the possibility to Ask the I for a List of dependencies of the I | |||
Q:"select SUM(BETRAG) FROM KOSTEN where Cond" into Cache | |||
A:20,00 Euro (=Cache Identifier 261526271) | |||
Qa:"select SUM(BETRAG) FROM KOSTEN where Cond" from Cache | |||
Qb:Cache 261526271 still valid? | |||
Aa:Cache 261526271 !if unchanged! | |||
Ab:20,01 Euro (=Cache Identifier 261526272) !if changed! | |||
== Club-Pages == | |||
"Ich hab schon mal auf einer Grafik gesehen, dass die Index-Values mit Duplikationen irgendwie mit in dem Suchbaum gespeichert sind, ich denke das hat traditionelle Gründe. Die richtige Implementierung ist es, nur die distinct Werte der Index-Values genau einmal zu speichern, jeder Index-Value sollte dann auf eine Mitgliederliste verweisen. Die Null Values gehören nicht in den Suchbaum und sollten die Index-Performanche nicht beinflussen, sie gehören in die Mitgliederliste "NULL". | |||
" | |||
Club Pages are Record-Reference Index-Pages telling if a Record is Part of a Club or not. Updating a Record may corse a "enter" oder "leave" club event. Delete is a "leave" is it was a memeber. Insert may be "enter". | |||
A "Club" is defined by Club-Rules this may be | |||
(INCOME>10.000) | |||
or | |||
(RID<>MASTER_R> | |||
Clubs speed up OLAP because if a Statement "where" Condition match a Club-Rule. | |||
join CLUB$$INCOME on | |||
(PERSON.RID=CLUB$$INCOME.PERSON_R) | |||
== Project names alternatives == | |||
Firebirdie from "firebird D", "D" for Deterministic Mode | |||
GenerationsCount: SQL Statement, das mich dazu gemacht hat. | |||
== Links == | |||
http://www.linbit.com/ | |||
== granid == | |||
// a simple artefact was found in space | |||
// just a small stone | |||
// a texture like granit | |||
// a first it seems like rauschen | |||
// than a set of algorithems | |||
// where extracted by the intelligence | |||
// | |||
// i am granid. a database server. i am open source | |||
// | |||
// several elements mixed up | |||
// by one spirit | |||
// one force arranged granulat | |||
// big pressure, fire and heat | |||
// then - silence - one piece | |||
// a stone was born | |||
// | |||
// performance, stability, transparence | |||
// no past, only the future | |||
// no limitations, only the chanche to make it faster | |||
// | |||
// ANFICO - The Language Extender | |||
// BLAGER - The Binary Lager | |||
// D64 - 64 Bit Data-Manager | |||
// | |||
// | |||
// Data: <UInt32 TimeStamp> <UInt64 RecordChangeTag> | |||
// | |||
// | |||
== Neu: CLUB == | |||
In Zukunft sollen Parameter Eingabedialoge abfragbar sein. (Ev. mit Historie und default Belegungen) | |||
In Zukunft sollen Dimensionen vordefiniert sein (Artikel,Belege, Verkäufe, Personen). | |||
In Zukunft sollen Attribute angebbar sein (Artikel.Anlage > -3 Monate, Person.Alter > 40). | |||
In Zukunft sollen Betriebsbegriffe vordefinierbar sein.<br> | |||
Umsatz(eines Artikel) (select MENGE_GELIEFERT from POSTEN-Club)<br> | |||
Umsatz(einer Person) (select DAVON_BEZAHLT from BELEG-Club)<br> | |||
<br> | |||
der POSTEN-Club kann dann einfach * sein, oder wieder selektiert. | |||
n-Idee: Jede Dimension hat üben, in die manden Abfrage-Faden flechten kann. | |||
OLAP-Admins definieren Clubs, Dimensionen und en. | |||
Zeiträume sollten nun (Jahrgenauer |Taggenauer | Monatgenauer | Wochengenauer) kummuliert werden. Das ist die Granularität Summe ist die größe Körnung. | |||
Club: A (maybe empty) List of Members | |||
Rule: A piece of deterministic logical code that decides, if a record is a Member or not. Club uses Rules to define if one is a member or not. | |||
Mem ber: A Record that is a part of one ore more clubs | |||
isMember(): Answer the Question if a En | |||
Field: space in a Record that can store a value , or can be empty | |||
Record: One Line in a Table | |||
Table: The Collection of Lines | |||
Welcome: Somewhat happend, you are now gain membership of a club | |||
Dependencies: A stored hint, that the system has to check again some rules to say welcome or good bye | |||
transformer: a piece of code that calculates a in64 from a key of | |||
Transaction: a of atoms | |||
atom: a single changing operation done to the base | |||
== Neu: Begriff == | |||
Begriffe wie "Kunde" und "Umsatz" oder "Sortimentsbewegung" werden als OLAP-Statements definiert. Ein zentrale OLAP Abfrage könnte nun Fragestellungen mit Hilfe von "Begriffen" formulieren. Beispiel: | |||
* Alle Kunden mit Sortimentsbewegung in den Sortimenten "A" und "B" im Zeitraum Januar bis März 2010 | |||
Die Auflösung würde so erfolgen, dass zunächst die Begriffe | |||
# 1. Schritt: Begriffstabellen bilden | |||
Kunden('01.01.2010','31.03.2010') | |||
Sortimentsbewegung('01.01.2010','31.03.2010') | |||
ausgelöst werden. Diese Begriffs-Eregbnise werden in temporären Ergebnistabelle zuwischengespeichert. Alles über EIN zentrales online SQL-Statement lösen zu wollen hat sich als unwirtschaftlich herausgestellt. | |||
# 2. Schritt | |||
select PERSON_R from Kunden | |||
join Sortimentsbewegung where | |||
(Kunden.PERSON_R=Sortimentsbewegung.PERSON_R) and | |||
( | |||
(Sortimentsbewegung.SORTIMENT_R='A') or | |||
(Sortimentsbewegung.SORTIMENT_R='B') or | |||
) | |||
würde man bei firebird alles in eine gemeinsame Transaktion packen, so könnte man "CREATE GLOBAL TEMPORARY TABLE ON COMMIT DELETE" machen. |
Aktuelle Version vom 18. Oktober 2013, 16:57 Uhr
(c) '2007-2008 by Andreas Filsinger
Redundant Array of Independent dataBases
The goal of RAIB (for firebird 3.0?!) is to ensure the availability of a firebird database - even if hardware fails. This is done by fail-over servers (spares) or Log-Packets (.flog File) on a remote Filesystem. In other Environments (without RAIB) you loose all Data from the Moment of your last Backup until the Moment of faliure. With RAIB you are able to get your Database up to the state where it was. This is just a concept - BUT implementing this ideas will make firebird 3.0 (with the vulcan engine) unbreakable.
Remember that malfunction of a single transistor in your high-end-server will bring firebird down (And modern servers have millions of it). Maybe your HD is OK, or you trust on a RAID 5 Storage. But the Question is: How long does it take to get the firebird-service online again!
With RAIB you can immediatly connect your application to a spare-Host after the failure. In this Moment you are restricted to "read-only" if you reduce to "read only" SQL-Statements, a admin can switch one spare to the master mode, so all the functionality of the firebird database can be used.
How it will work
The work is done by 8 (new?) techniques, not yet implemented in firebird:
- Statement-Classification: by evaluating and executing a statement it is classified as a "R"- "W"- or "I"-Statement.
- Heart-Beat-Sequence: a BIGINT living inside the database, counts the impacts to the database since the Statement "CREATE DATABASE". This will help naming Statement-Log-Packets
- Statement-Order: A single Instance that decides in wich order the incoming Statements are executet. Inside a time window, this instance is free to do this ordering, but the order MUST be logged.
- Statement-Packets-Logger: Logs DDL/DML "W"-Statements in a way so they can build basis of a Statement - "playback".
- snap-shot-backups: a Backup-Mode where all the impact AFTER the Backups was started are invisible to the Backup.
- Statement-Playback: a system where server can execute "Statement-Packets".
- Playback-Mode: gbak sets a database in this mode when doing statement Playback to a database.
- Deterministic SQL: all Statements MUST end in computable results, so they can be executed in a server-swarm
Heart-Beat-Sequence
SEQ_HEART_BEAT
A Database has an internal SEQUENCE (GENERATOR) (the Heart-Beat-Sequence = HBS [BIGINT]) that is incremented after each W-Statement commit doing some change to data or metadata. A read-Only Statement (I/R) should not incrememt the HBT.
Serialize
The master must have a mechanism to force quasi parallel W-Statements in an serialized order. The assigned HBS Value will make clear in wich order the statements are executed. Long time conspumting statements are not a problem because a W-Statement with HBS = n, only sees the Database in the Snap-Shop State n-1.
Statement Classification
firebird executes SQL-Code like "DROP TABLE SAMPLE" or Service Requests like "BACKUP DATABASE". If we want something like a spare-server we should filter the requests that really do some change to the Database. The three classes are "W"rite / "R"ead and "I"nformal.
"W" Requests
"R" Requests
"I" Requests
Samples
Client-Request: DROP TABLE CLUB$2873
Server-Classification: "W" / SEQ_HEART_BEAT=928372
Client-Request: SELECT CURRENT_TIMESTAMP from RDB$DATABASE
Server-Classification: "I"
Client-Request: SELECT COUNT(RID) from HUGETABLE
Server-Classification: R
Deterministic SQL
a quest to impact that a Statement do to the Database, that it will have the same result independant when where and why executed. Forbidden are Sideeffekts like "CURRENT_TIMESTAMP", "RAND()", results of USER Tabels or PERFORMANCE Tables or Logik that depends on Bits that are Outside the Database itself.
Why do we need this:
- Because of relyable playbacks.
- Because of an exetuting swarm, resulting in a 100% identical database
GID
as a "changed" marker
The GID is a global int64 Sequence (Generator) and a "implizit" Field in every DB-Table wich mark or tag Records, Tables, Views and the Database itself. By now there are no arguments for having a GID Granularity set to Field-Level.
- Every Time a "w" statement was completed, the Database GID gets incremented. You can read the actual GID by
select GID from RDB$DATABASE
- Also this new GID Value is store to the affecte TABLE(s). You can read this.
select GID from TABLE1 // seems the same like select MAX(GID) from TABLE1
- Every Record holds its GID
// get the actual GID of a record select GID from TABLE1 where RID=2029382 // get all the record, sorted by modification order, newest first select * from TABLE1 order by GID descending, RID ascending
as the only valid Reference
STOP! Do not mix up RID and GID sequence. Think about a global RID pool! New inserted Records get their (first) RID from the GID
Update a Record: if unchanged
sometimes Applications or users need long time to "change" a record. At "first" the Applications read all Record-Data from Db-Table. Then after 5 Minutes they post their changed Record-Fields by a update statement. If the Data-Mask is complex, than even more TABLES may be affected by the update. If the application not only read the REcord-Data but also read the GID, it can post an update "on the knowledge state of 8383892". If the knowledge-Sate has gone further, the update will fail - the user may be informed that the unlying data has changed in the meantime and the update is not possible by know. The App may present a diff of the "local" and new "remote" Record - let the user decide how the merge could look like.
update TABLE1 set FIELD1 = NewValue where (RID=828273) and (GID=8383892)
if "affected rows" = 0 RID is gone or GID is changed.
Statement-Log-Packets
A SQL-Statement like "DROP TABLE HUGEONE" may purge thousands of Pages of the Database File, and modify 1 or 2. The Log-Packet itself schould NOT contain all the changed Database Pages or such low-level Details but the SQL-Statement itself. We can trust on the fact that the spares have all the knowledge to understand the Statement and execute it exactly like the master. This information is transmitted to one or more spares. This information is also stored in the Database itself for a time (not part of a backup). At least until its prooven that a n.fabk is available. After the Log-DataBlock must succsessfully
- Do the Statement, detect while doing it, if it is harmfull to the Database (W oder R)
- if W: compile a log-block with the name GEN_ID(HBT,1); if R: break
- Send the LOG over the line.
Generating a LOG-Packet
"R"-Statements /"W"-Statements
Each Request, on its way through the Database holds a Flag, saying "I am Read/Only" or "I did some impact" to the Database. This Flag (the R/W-Flag) is set by the engine in the actual context, because you can formulate a statement wich is 99% a "R"-one, but there comes a day, this statement do a change to the Database. When the engine detects a change to Data or Metadata it sets the "Write"-Flag, this statement was prooven now to have a impact to the database, it is a "W"-Statement and must at its End increment the SEQ_HEART_BEAT and a Statement-Log is written.
The Log-Packet is assembled by the engine step by step right in the moment by parsing and executing it. The LOG-Packet is the "compiled" version of the Statement. The LOG-Paket is written to a memory block. If a "W" Statement reads CURRENT_TIMESTAMP (Timestamp) or RND (Randomize) a copy of the returned value MUST be stored in the LOG, because the spares can not trust on own values. If a Request is fully executed, and the engine didn't set the "W"-Flag - the LOG-Packet is purged, because it did no harm to the database it can say us nothing about history. Else (a "W"-Statement) the Packet is populatetd to the spares. The server itself hold a persistent copy of the Log-Packet to help spare get up to date if they missed something. Only a successfull backup can delete the Log-Packets that are generated before the .fbak HBS.
Quasi "W" Statements
select CURRENT_TIMESTAMP from RDB$DATABASE
is a "W"-Statement that can only be executed by the master, because the client ask for information that is NOT part of the Database itself. So all external Information going throgh the engine are Quasi-"W" (user account infos, or coming user SQL Statements). They MUST be executed by the master, but do NOT goto the Log.
Statement
UPDATE PERSON SET MOMENT = CURRENT_TIMESTAMP WHERE LOAN = RND(2920)
Log-Paket
P:RND=2093 P:CURRENT_TIMESTAMP P:CURRENT_TIMESTAMP P:CURRENT_TIMESTAMP S:UPDATE PERSON ...
Snap-Shot-Backups
Statement-Playback
Database in spare mode
- the LOG-Receive Thread
the master initiates communication to a spare and post LOG Packets to the spare. Alle Packets are flushed into memory - if CRC is ok, the spare sends a ACK to the master. The master do NOT wait until the spare executes the statement because the master already prooved that the statement is executable.
- the Worker-Thread
read the LOG-Packets, and execute it against the own Database.
- LOG-missed
if the spare was offline a time, she sees that local.HBT <> LOG.HBT+1. Before it can execute actual LOG-Packets she MUST ask the master (or other spares) for the missing LOG Packets. After doing the old stuff it can start to do execute the actual receiced LOG-Packtes. While this time, the log-Receive thread isnt stopping filling the LOG-Buffer.
Playback-Mode
The Playback-Mode (or Spare Mode) of a database is a new type of Database "Shutdown" Mode. Connections to this host in Spare Mode see no difference to a normal Database. work normal but refuse "W" Statement with a special error message. They may redirekted to a master.
be a local spare
the master writes the LOG-Packets (FileName <HBT>.flog) to a (remote) filesystem.
be a cold spare
a firebird-server receives the log-Packets but do not execute them. The worker-Thread writes the .flog Files to the filesystem if the system is idle.
be a hot spare
a firebird-server receives and execute the log-Packets a fast as possible.
be a master
planned switiching the role from master to spare
- the master do not accept W Statements any more. (Try again in 2 Minutes!)
- the master ensures that one spare is ready to take over (master.HTB=spare.HTB)
- the master switch to "spare" Mode
- the spare switch to "master" Mode
Database in master mode
no limitations. He must bring the Log-Packets to the world:
- By UDP to the spares.
- By fopen to the filesystem.
- By move to the memory.
(add a ) RDB$Cluster Table
inside the database is the information about the spares and the master.
IP Role last-HBT 192.168.115.192 master 192.168.115.196 spare 192.168.115.80 spare 30.23.12.3 spare
detecting the failure
- Master is Not Responding
- Spare say they have problems executing crazy Packets.
Positive Side Effekts
if a client assumes that a special statement is "R" AND HBT of a spare is the HBT of the master she can post this statement to a spare to take load away from the master. The spare will bring the same result as the master.
Communication master<->spare
if a spare comes up (on connection request of a master), a spare can ask the master how
Good to have several spares
it is enough fore the master to have only one responding spare. If a connection to a spare fails the master retries it every 20 Seconds.
Restore Szenario
- restore the database from a .fbak
- read the value of GEN_HBT
- let the server execute all *.flog Files beginning from GEN_HBT+1
New ERROR Messages
"W"-Statement not allowed on a spare
the client must understand a new error-Msg when trying to place a statement to a spare that was a master before. the client must interrpret the new connection string and (re)prepare a open statements.
- Error-Msg with a Retry-Option: "Im a Spare, please retry with [%server:%database]"
Failed to Log your Statement
if the master can not log the statement to any spare this error MSG is generated.
the lying master concept
Imaging a host wich is a spare (in truth) and tells the world that she is a Master. This is a spare wich executes all the "R" Statement, but automatically routes "W"-Statements to a master without telling the connection something about it. The result is routed back to the connection AFTER she has executed the Log-Packet, so that this spare is "true" wich its single connection. There may be a "gab" of Log-Packets (more to come here).
You may have a Grid of "lying masters" in a webshop scenario. Alle the "W" Work is done by one hidden master, not slowed down by anoing "R" Statements of any connection.
the lazy worker
- say "Im done" if all checks are done that ensure "T" will succseed, but dont do the work!
the Cash Assoziator
- Cashing: is a technique that repeats a saved answer to an known question without "understanding" the question again. This is Quick!
- Trick 1: Always use cash if since last "R" there was no "T"
- Trick 2: On first insert in cash, build a dependecy-tree, on what the Answers depends on, if asked again, only check tree, if not more developed than the saved version -> use cashed answer
- Hint: While executing "W" Statement cut Cache-trees
- Trick 3: Cash-Questions are parallel to Answers, we have an unlimited Count of CPUs.
the Context Bubble
- repesents the Dependecy Tree
A cash Element contains the "cas-Azzoziator-ID" the "Answer" the "Context-Bubble". In "idle" Time the Cash is cleared off all the Entries wich Context-Bubbles blobs. A "R" Statement can be cashed if "Context" Bubble still is alive
the new I/O Model: IO+C (Input Output Cacheable)
- extends the classic I/O Modell by the possibility to Ask the I for a List of dependencies of the I
Q:"select SUM(BETRAG) FROM KOSTEN where Cond" into Cache A:20,00 Euro (=Cache Identifier 261526271) Qa:"select SUM(BETRAG) FROM KOSTEN where Cond" from Cache Qb:Cache 261526271 still valid? Aa:Cache 261526271 !if unchanged! Ab:20,01 Euro (=Cache Identifier 261526272) !if changed!
Club-Pages
"Ich hab schon mal auf einer Grafik gesehen, dass die Index-Values mit Duplikationen irgendwie mit in dem Suchbaum gespeichert sind, ich denke das hat traditionelle Gründe. Die richtige Implementierung ist es, nur die distinct Werte der Index-Values genau einmal zu speichern, jeder Index-Value sollte dann auf eine Mitgliederliste verweisen. Die Null Values gehören nicht in den Suchbaum und sollten die Index-Performanche nicht beinflussen, sie gehören in die Mitgliederliste "NULL". "
Club Pages are Record-Reference Index-Pages telling if a Record is Part of a Club or not. Updating a Record may corse a "enter" oder "leave" club event. Delete is a "leave" is it was a memeber. Insert may be "enter". A "Club" is defined by Club-Rules this may be
(INCOME>10.000)
or
(RID<>MASTER_R>
Clubs speed up OLAP because if a Statement "where" Condition match a Club-Rule.
join CLUB$$INCOME on (PERSON.RID=CLUB$$INCOME.PERSON_R)
Project names alternatives
Firebirdie from "firebird D", "D" for Deterministic Mode
GenerationsCount: SQL Statement, das mich dazu gemacht hat.
Links
granid
// a simple artefact was found in space // just a small stone // a texture like granit // a first it seems like rauschen // than a set of algorithems // where extracted by the intelligence // // i am granid. a database server. i am open source // // several elements mixed up // by one spirit // one force arranged granulat // big pressure, fire and heat // then - silence - one piece // a stone was born // // performance, stability, transparence // no past, only the future // no limitations, only the chanche to make it faster // // ANFICO - The Language Extender // BLAGER - The Binary Lager // D64 - 64 Bit Data-Manager // // // Data: <UInt32 TimeStamp> <UInt64 RecordChangeTag> // //
Neu: CLUB
In Zukunft sollen Parameter Eingabedialoge abfragbar sein. (Ev. mit Historie und default Belegungen)
In Zukunft sollen Dimensionen vordefiniert sein (Artikel,Belege, Verkäufe, Personen).
In Zukunft sollen Attribute angebbar sein (Artikel.Anlage > -3 Monate, Person.Alter > 40).
In Zukunft sollen Betriebsbegriffe vordefinierbar sein.
Umsatz(eines Artikel) (select MENGE_GELIEFERT from POSTEN-Club)
Umsatz(einer Person) (select DAVON_BEZAHLT from BELEG-Club)
der POSTEN-Club kann dann einfach * sein, oder wieder selektiert.
n-Idee: Jede Dimension hat üben, in die manden Abfrage-Faden flechten kann.
OLAP-Admins definieren Clubs, Dimensionen und en.
Zeiträume sollten nun (Jahrgenauer |Taggenauer | Monatgenauer | Wochengenauer) kummuliert werden. Das ist die Granularität Summe ist die größe Körnung.
Club: A (maybe empty) List of Members Rule: A piece of deterministic logical code that decides, if a record is a Member or not. Club uses Rules to define if one is a member or not. Mem ber: A Record that is a part of one ore more clubs isMember(): Answer the Question if a En Field: space in a Record that can store a value , or can be empty Record: One Line in a Table Table: The Collection of Lines Welcome: Somewhat happend, you are now gain membership of a club Dependencies: A stored hint, that the system has to check again some rules to say welcome or good bye transformer: a piece of code that calculates a in64 from a key of Transaction: a of atoms atom: a single changing operation done to the base
Neu: Begriff
Begriffe wie "Kunde" und "Umsatz" oder "Sortimentsbewegung" werden als OLAP-Statements definiert. Ein zentrale OLAP Abfrage könnte nun Fragestellungen mit Hilfe von "Begriffen" formulieren. Beispiel:
- Alle Kunden mit Sortimentsbewegung in den Sortimenten "A" und "B" im Zeitraum Januar bis März 2010
Die Auflösung würde so erfolgen, dass zunächst die Begriffe
# 1. Schritt: Begriffstabellen bilden Kunden('01.01.2010','31.03.2010') Sortimentsbewegung('01.01.2010','31.03.2010')
ausgelöst werden. Diese Begriffs-Eregbnise werden in temporären Ergebnistabelle zuwischengespeichert. Alles über EIN zentrales online SQL-Statement lösen zu wollen hat sich als unwirtschaftlich herausgestellt.
# 2. Schritt select PERSON_R from Kunden join Sortimentsbewegung where (Kunden.PERSON_R=Sortimentsbewegung.PERSON_R) and ( (Sortimentsbewegung.SORTIMENT_R='A') or (Sortimentsbewegung.SORTIMENT_R='B') or )
würde man bei firebird alles in eine gemeinsame Transaktion packen, so könnte man "CREATE GLOBAL TEMPORARY TABLE ON COMMIT DELETE" machen.