TDT4145 - Data Modelling, Databases and Database Management Systems
Undervisningsmateriell
Datamodellering
Begrep | Forklaring |
---|---|
Miniverden | De konseptene fra virkeligheten som skal modelleres. |
Datamodell | Et sett begreper for å beskrive struktur (hvilke typer data, sammenheger og restriksjoner) og operasjoner (handlinger på data: innsetting, endring, sletting og spørring) |
Konseptuelle modeller | For å forstå og dokumentere. For eksempel ER-modeller. |
Implementasjonsmodeller | For å realisere. For eksempel relasjonsdatabasemodeller. |
Fysiske modeller | Kontroll på lagring og optimalisering. Disse er produktspesifikke modeller. |
Database | En strukturert samling av relaterte data. |
Informasjon | Data + tolkning |
DBMS | Databasehåndteringssystem. Software som styrer lagring og tilgang på data i databasen. |
Databasesystem | DBMS + data |
Selvbeskrivende DBMS | Inneholder både data og beskrivelser om dataen (metadata). Dette gjøres generelt i DBMS-katalogen. |
Selvbeskrivende datamodeller | Inkluderer databeskrivelser i selve dataen. For eksempel XML og JSON. |
Program-data uavhengighet | En egenskap som går ut på at strukturendring i datafil ikke medfører at man må endre programmene som har tilgang til dataen. |
Flerbrukersstøtte | Flere brukere kan hente ut og lagre data samtidig uten problemer. |
Entitetsintegritet | Alle relasjonsskjemaer skal ha en primærnøkkel. Primærnøkler må være unike og kan ikke være NULL . |
Referanseintegritet | Hvis en fremmednøkkel ikke er NULL , må den referere til en eksisterende rad i den tabellen den peker mot. |
Databasetilstand | Database-forekomsten (dataene) på et gitt tidspunkt. |
Konsistent databasetilstand | At databasen tilfredsstiller alle integritetsregler og relevante reglene fra miniverdenen. |
Transaksjon | En serie operasjoner på en database som sammen bevarer databasens konsistens. |
ER-modellering
Viktig! ER-notasjon:
Entiteter og entitetsklasser
Begrep | Forklaring |
---|---|
Entitet | Objekt som eksisterer i miniverdenen. Et element av en entitetsklasse. |
Enitetsklasse | Mengden av alle likeartede entiteter som er av samme klasse (type) og har samme egenskaper |
Svak entitetsklasse | En entitetsklasse der nøkkelen (delvis nøkkel) bygger på nøkkelen fra en annen identifiserende entitetsklasse, gjennom en identifiserende relasjonsklasse. Den må være eksistensavhengig av deltakelse i den identifiserende relasjonsklassen. |
Attributter
Attributter beskriver entiteter. Hvert attributt henter sine mulige verdier fra et domene (datatype).
Type attributt | Notasjon | Forklaring |
---|---|---|
Enkel attributt (Standard) | Ingen | Attributt |
Flerverdiattributt | Dobbel linje | Kan holde flere verdier |
Sammensatt attributt | Har egne subattributter | Har egne subattributter |
Avledet attributt | Stiplet linje | Kan avledes fra andre attributter, trenger ikke lagres eksplisitt |
Nøkkelattributt | Understreket | Identifikator for entiteten |
Relasjoner og relasjonsklasser
Begrep | Forklaring |
---|---|
Relasjon | Sammenheng mellom to eller flere enititeter. Kan ha attributter på samme måte som entiteter. Relasjonen eksisterer ikke uten de entitetene som deltar. |
Relasjonsklasse | Mengden av likeartede relasjoner mellom samme enitetsklasser. |
Rekursiv relasjonsklasse | Relasjonsklasser der samme entitetsklasse inngår flere ganger. En entitetsklasse kan da ha flere roller i relasjonsklassen. |
Kardinalitet og strukturelle restriksjoner
Foreleser i emnet foretrekker strukturelle restriksjoner
Begrep | Forklaring |
---|---|
Kardinalitet | Ved bruk av 1 eller N på hver side av relasjonen antydes om relasjonen er en-til-en, en-til-mange eller mange-til-mange. Noteres på motsatt side av relasjonen som den aktuelle entiteten. Enkel strek antyder delvis deltagelse (entitetene må ikke delta i relasjonen), dobbel strek antyder total deltagelse (entitetene må delta i relasjonen minst en gang). |
Strukturell restriksjon (min, max) | Antyder at en entitet opptrer i relasjonen minst min ganger og maks max ganger. Vanlige restriksjoner er (0, 1), (1, 1), (0, n) og (1, n). Noteres på samme side av relasjonen som den aktuelle entiteten. |
Forekomstdiagram
EER-modellering
Spesialisering og generalisering
Begrep | Forklaring |
---|---|
Superklasse-subklasserelasjon | Subklassene arver egenskapene (attributter og relasjoner) til superklassen, men kan også ha særegne attributter og relasjoner. En subklasse er en delmengde av den tilhørende superklassen. |
Total spesialisering | Alle entiteter i superklassen må delta i minst en subklasse. Noteres med dobbel strek. |
Overlappende subklasser | En entitet i superklassen kan delta i flere enn en subklasse. Noteres med O eller mangel på notasjon. |
Disjunkte subklasser | En entitet i superklassen kan ikke delta i flere enn en subklasse. Noteres med D. |
En entitet kan delta i | Disjunkt | Overlappende |
---|---|---|
Delvis | 0-1 subklasser | 0-n subklasser |
Total | 1 subklasse | 1-n subklasser |
Kategorier
En kategori representerer en samling av entiteter som er en delmenge av entitetene fra ulike entitetsklasser (kategorienes superklasser). Blir også kalt en union-type. En kategori kan være delvis eller total. En total kategori vil inneholde alle entitetene i unionen av superklassene og markeres med dobbelt strek mellom kategorien og sirkelen.
Relasjonsdatabaser
Relasjonersdatabaser er bygd opp av tabeller der kolonner angir attributter og rader angir entiteter eller relasjoner.
Mapping fra ER-modell
Konsept | Hvordan mappe |
---|---|
Regulære entitetsklasser | Entitetsklassen får egen tabell. Attributter blir kolonner. Sammensatte attributter splittes opp. Primærnøkkel fra ER-modell brukes. |
Svake entitetstyper | Legger til primærnøkkel (fremmednøkkel) fra identifiserende entitetsklasser, sammen med klassens delvise nøkkel. Ellers likt. |
Binære 1:1 relasjoner | Fremmednøkkel kan legges i en av entitetsklassetabellene, eller så kan relasjonen få egen tabell. |
Binære 1:N relasjoner | Fremmednøkkel kan legges i entiteten med kardinalitet 1, eller så kan relasjonen få egen tabell. |
Binære N:N relasjoner | Relasjonen får egen tabell. |
Flerverdiattributter | Får egen tabell med fremmednøkkel mot entitetsklassen. |
N-ære relasjonsklasser (N > 2) | Får egen tabell med fremmednøkler mot alle tilhørende entitetsklasser. |
Mapping fra EER-modell
Konsept | Hvordan mappe |
---|---|
Spesialisering/generalisering | A) Separate tabeller for superklasse og alle subklasser. Kommentar: Data om en entitet spredt over flere tabeller, må bruke join. B) Separate tabeller for subklasser som inkluderer attributter fra superklassen. Kommentar: Krever total og disjunkt spesialisering. C) En tabell som inkluderer attributter fra superklassen og alle subklasser. Kommentar: Ved disjunkt spesialisering kreves et type-attributt. Ved overlappende spesialisering kreves en boolsk type-vektor (en kolonne per type). |
Kategorier | En tabell for kategori-entitetsklassen, med typeattributt. Superklassene har som regel ulike nøkler, i tillegg til en surrogat(fremmed)nøkkel mot kategorien. |
Relasjonsalgebra
I relasjonsalgebra bruker vi operasjoner for å manipulere relasjonsdatabasetabeller. Merk at en rad her er en tuppel og en tabell er en mengde av tuppler. Vi vil altså ikke ha duplikater. Operatorene er lukket over tabeller (både input og output er mengder).
Operasjoner
Operasjon | Operator | Forklaring |
---|---|---|
Seleksjon | Returnerer radene i A som oppfyller betingelsen. | |
Projeksjon | Returnerer de oppgitte kolonnene i A. | |
Union | Returnerer unionen av A og B. Krever like kolonnenavn. | |
Snitt | Returnerer snittet av A og B. Krever like kolonnenavn. | |
Differanse | Returnerer radene i A som ikke er i B. Krever like kolonnenavn. | |
Kartesisk produkt | Kombinerer alle rader i A med alle rader i B. | |
Indre join | Kombinerer radene i A og B som sammen oppfyller betingelsen. | |
Naturlig join | Kombinerer radene i A og B der alle kolonner i A og B med samme navn også har samme verdi. | |
Venstre ytre join | Kombinerer radene i A og B som sammen oppfyller betingelsen, men inkluderer også resten av radene i A (med NULL -verdier i nye kolonner). |
|
Høyre ytre join | Kombinerer radene i A og B som sammen oppfyller betingelsen, men inkluderer også resten av radene i B (med NULL -verdier i nye kolonner). |
|
Full ytre join | Kombinerer radene i A og B som sammen oppfyller betingelsen, men inkluderer også resten av radene i A og B (med NULL -verdier i nye kolonner). |
|
Omdøping | Returnerer en ny tabell med nye kolonnenavn. | |
Sortering | Returnerer en tabell der radene er sortert etter de oppgitte attributtene. Man kan spesifisere ASC (stigende) eller DESC (synkende) rekkefølge. |
Gruppering og aggregering
Gjør utregninger "per gruppe". Visualisert med graf plasseres grupperingsattributtene på venstre side og aggregeringsfunksjonene på høyre side. Man kan også definere et alias med . Eksempel: La Da vil returnere en tabell med antall motiver hver fotograf har avbildet.
Funksjon | Forklaring |
---|---|
Sum | |
Gjennomsnitt | |
Minimum | |
Maksimum | |
Antall verdier (unntatt NULL) | |
Antall rader |
SQL
Deklarativt spørrespråk ("hva, ikke hvordan") - vi trenger ikke tenke på optimalisering. Ikke mengdeorientert som relasjonsalgebra, resultattabellene kan ha like rader - må be om at duplikater fjernes.
Opprette data
DDL - Data Definition Language
CREATE TABLE /* opprette tabell */
ALTER TABLE /* endre tabell */
DROP TABLE /* fjerne tabell */
CREATE TABLE Person (
Pnr INTEGER NOT NULL,
Navn VARCHAR(30),
Alder INTEGER,
CONSTRAINT Person_PK PRIMARY KEY (Pnr));
Fremmednøkkelrestriksjoner
CONSTRAINT RestriksjonsNavn FOREIGN KEY (Attributt) REFERENCES FremmedTabell(FremmedAttributt)
ON UPDATE <>
ON DELETE <>);
Alternativer for <>
:
NO ACTION
(default) /RESTRICT
: stopper handlingenSET NULL
: setterNULL
-verdiCASCADE
: oppdaterer/sletter tilsvarende
Manipulere data
DML - Data Manipulation Language
Sette inn data - INSERT INTO
INSERT INTO Person VALUES (1, 'Ola', 24);
INSERT INTO Person VALUES (2, 'Kari', 45);
INSERT INTO Person VALUES (3, 'Per', 14);
Oppdatere data - UPDATE
UPDATE Person
SET Navn = 'Karianne'
WHERE Pnr = 2
Slette data - DELETE
DELETE FROM Person
WHERE Navn = 'Ola'
Spørringer - SELECT
Merk: SELECT *
angir at vi ønsker alle attributter.
SELECT attributt, ...
FROM tabell, ...
WHERE betingelse
Relasjonsalgebraekvivalent:
Spesifisere krav - WHERE
Følgende operator virker som forventet.
Operator |
---|
AND , OR |
= , != , < , , <= , >= |
Merk: <>
er det samme som !=
-- Finner personnummer og navn for personer over 20
SELECT Pnr, Navn
FROM Person
WHERE Alder > 20
Fjerne duplikater - DISTINCT
SELECT DISTINCT Navn
FROM Person
Sortering - ORDER BY
SELECT Navn
FROM Person
ORDER BY Alder ASC
Strengsammenlikning - LIKE
%
angir vilkårlig antall tegn. _
angir ett tegn.
-- Finner alle attributter for personer med navn som begynner på P
SELECT *
FROM Person
WHERE Navn LIKE 'P%'
Data fra flere tabeller: kartesisk produkt og ulike typer join
Plasseres i FROM
-delen. Gitt tabeller A
og B
:
Funksjon | Beskrivelse |
---|---|
A, B |
Kartesisk produkt |
A CROSS JOIN B |
Kartesisk produkt |
A NATURAL JOIN B |
Natural join |
A (INNER) JOIN B |
Indre join |
A LEFT (OUTER) JOIN B |
Venstre ytre join |
A RIGHT (OUTER) JOIN B |
Høyre ytre join |
A FULL (OUTER) JOIN B |
Full ytre join |
Parenteser antyder frivillig spesifisering. Se relasjonsalgebra for nærmere forklaring av funksjonene.
Betingelse | Beskrivelse |
---|---|
NATURAL |
Natural join |
ON betingelse |
Spesifiserer join-betingelsen |
USING (attributt, ...) |
Spesifiserer hvilke attributter som må være like i de to tabellene |
-- Finner navn på hund og eier for personer over 60 år
SELECT Person.Navn, Hund.Navn
FROM Person INNER JOIN Hund ON Person.Pnr = Hund.EierPnr
WHERE Person.Alder > 60
Aggregering
Se aggregering i relasjonsalgebra.
Funksjonene ignorerer NULL
-verdier. Med DISTINCT
opererer funksjonene på unike verdier. AS
definerer et alias.
-- Finner snittalderen av alle personer
SELECT AVG(Alder) AS SnittAlder
FROM Person
Gruppering - GROUP BY
Definerer grupper (partisjoner) av rader. Må komme etter en eventuell WHERE
.
-- Finner snittalderen for hvert navn
SELECT AVG(Alder) AS SnittAlder
FROM Person
GROUP BY Navn
Betingelse etter gruppering - HAVING
Som WHERE
, men etter gruppering.
-- Finner snittalderen for hvert navn, men kun hvis snittet over 60
SELECT Navn, AVG(Alder) AS SnittAlder
FROM Person
GROUP BY Navn
HAVING SnittAlder > 60
Nøstede spørringer
I WHERE
-delen kan vi sammenlikne med resultatet fra en spørring. I FROM
-delen kan vi gjøre en ny spørring på resultatet av en spørring.
Sammenlikningsoperatorer
=, !=, <, >, <=, >=
virker som forventet.
-- Finner navnet til den eldste personen
SELECT Navn
FROM Person
WHERE Alder = (SELECT MAX(Alder) FROM Person)
ANY
, ALL
-- Finner navnet til den eldste personen (alternativ måte)
SELECT Navn
FROM Person
WHERE Alder >= ALL (SELECT Alder FROM Person)
IN
, NOT IN
-- Finner snittalderen til forfattere
SELECT AVG(Alder)
FROM Person
WHERE Navn IN (SELECT Forfatter FROM Bok)
EXISTS
Dette er en korrelert del-spørring, dvs. at den sender inn data. (Om Person-entiteter i dette tilfelle).
-- Finner navn på personer som ikke jobber med noen andre personer
SELECT Navn
FROM Person
WHERE NOT EXISTS (
SELECT *
FROM JobberMed
WHERE JobberMed.PnrA = Person.Pnr OR JobberMed.PnrB = Person.Pnr
)
Union, snitt og differanse
Krever kompatible operandtabeller (like kolonnenavn).
Funksjon | Forklaring |
---|---|
UNION |
Union |
INTERSECT |
Snitt |
EXCEPT |
Minus |
UNION ALL |
Union (beholder duplikater) |
INTERSECT ALL |
Snitt (beholder duplikater) |
EXCEPT ALL |
Minus (beholder duplikater) |
-- Finner navn som brukes på både personer og hunder
SELECT Navn FROM Person
INTERSECT
SELECT Navn FROM Hund
Nøstede spørringer i FROM
-delen
I FROM
-delen kan vi gjøre en ny spørring på resultatet av en spørring. Lager en midlertidig tabell.
-- Finner antall arbeidere for ingeniøryrker med under 1000 arbeidere
SELECT Yrke AntallArbeidere
FROM (
SELECT Yrke, COUNT(Pnr) AS AntallArbeidere
FROM Person
GROUP BY Yrke
)
WHERE Yrke LIKE '%ingeniør' AND AntallArbeidere < 1000
Virtuelle tabeller - VIEW
Views er tabeller som er avledet ut fra tabeller som er lagret i databasen. Hvorfor: Spørreforenkling, sikkerhet og ytelse.
CREATE VIEW Forelder(ForelderPnr, ForelderNavn, BarnPnr, BarnNavn)
AS SELECT F.Pnr, F.Navn, B.Pnr, B.Navn
FROM Person AS F INNER JOIN Person AS B
ON (F.Pnr = B.MorPnr OR F.Pnr = B.FarPnr)
Normalisering
Vi ønsker å unngå redundans (lagring av samme informasjon flere ganger) og innsettings-, oppdaterings- og slettings-anomalier.
Restriksjoner
Begrep | Forklaring |
---|---|
Implisitte restriksjoner | Er en del av datamodellen og håndheves derfor alltid av DBMS. |
Eksplisitte restriksjoner | Kan uttrykkes i datamodellen (databaseskjemaet). Håndheves av DBMS. Eksempler: Primærnøkkel, fremmednøkkel, datatyper, verdi-begrensninger, etc. |
Applikasjonsbaserte restriksjoner (business rules) | Må håndteres utenfor datamodellen (av applikasjonsprogrammene). |
Funksjonelle avhengigheter
Notasjon | Forklaring |
---|---|
Skjema for tabellen. A, B, ... er attributter. | |
er funksjonelt avhengig av | |
Tabellforekomsten | |
Rad i tabellforekomsten | |
Radens verdi for attributtet | |
En delmengde av attributtene i |
Forkortes gjerne til FA.
, der uttrykker en restriksjon på alle lovlige tabellforekomster for (en funksjonell avhengighet). Alle rader (tuppler), pg , i en forekomst som har samme verdier for attributtene i (dvs. ), må ha samme verdier for attributtene i (dvs. ).
Kort sagt: FA-en tilsier at alle rader med samme må ha samme .
Utledningsregler
Disse er sjelden nødvendige.
Navn | Regel |
---|---|
IR-1 (reflexive) | gir |
IR-2 (augmentation) | gir |
IR-3 (transitive) | gir |
IR-4 (decomposition) | gir |
IR-5 (additive) | gir |
IR-6 (pseudotransitive) | gir |
(mengden av alle attributter) |
Full funksjonell avhengighet
En funksjonell avhengighet er en full funksjonell avhengighet hvis det er umulig å fjerne et attributt, , og ha . Kan tenkes på som at er en minimal nøkkel for .
Fler-verdi-avhengigheter - MVD
La være delmengder av . En fler-verdi-avhengighet betyr at mengden -verdier som er assosiert med en -verdi bestemmes av og er uavhengig av andre attributter. Dette kalles også for en MVD (multi-value dependency).
Eksempel: I en tabell over hvilke hobbyer personer har er . Det er en mengde Hobby-verdier som er assosiert med en PersonNummer-verdi og uavhengig av andre attributter.
Trivielle fler-verdi-avhengigheter
er triviell hvis er en delmengde av eller . Trivielle MVD-er utelates restriksjoner i 4NF.
Tillukning
Anta er en mengde funksjonelle avhengigheter. Tillukningen til er mengden av alle funksjonelle avhengigheter som kan utledes av og er gitt ved følgende
Anta og , . Tillukningen til er mengden av alle attributter som er funksjonelt avhengige av og er gitt ved følgende
Nøkler
Begrep | Forklaring |
---|---|
Nøkkelattributt | Et attributt som inngår i en eller flere kandidatnøkler |
Ikke-nøkkelattributt | Et attributt som ikke inngår i noen kandidatnøkler |
Supernøkkel | En mengde attributter som sammen danner en unik identifikator |
Nøkkel/kandidatnøkkel | En minimal supernøkkel (inneholder ingen flere attributter enn nødvendig) |
Primærnøkkel | Primærnøkkelen velges blant kandidatnøklene |
Sekundærnøkkel | En kandidatnøkkel som ikke er primærnøkkel. Her kan vi velge å tillate NULL -verdier. |
Normalformer
Alle høyere normalformer forutsetter de lavere normalformene.
Første normalform - 1NF
En tabell er på 1NF hvis og bare hvis alle attributter har en enkelt verdi fra domenet.
Andre normalform - 2NF
En tabell er på andre normalform hvis og bare hvis det ikke finnes noen ikke-nøkkelattributter som er delvis avhengig av en kandidatnøkkel.
Tredje normalform - 3NF
En tabell er på 3NF hvis og bare hvis det for alle tabellens funksjonelle avhengigheter på formen er slik at enten
- X er en supernøkkel i tabellen, eller
- Y er et nøkkelattributt i tabellen.
Boyce-Codd normalform - BCNF
En tabell er på BCNF hvis og bare hvis det for alle tabellens funksjonelle avhengigheter på formen , er slik at er en supernøkkel i tabellen.
Fjerde normalform - 4NF
En tabell er på 4NF hvis og bare hvis det for alle ikke-trivielle MVD-er , er slik at er en supernøkkel.
Dekomponeringskriterier
Dekomponeringskriterium | Forklaring |
---|---|
Normalform | Ser på hver enkelt projeksjon (tabell), og ønsker så høy normalform som mulig. |
Attributtbevaring | Alle attributter i må finnes i minst en av projeksjonene, slik at den samme dataen kan lagres. |
Bevaring av funksjonelle avhengigheter | Alle funksjonelle avhengigheter i skal finnes i en eller flere -er eller kunne utledes fra FA-ene som gjelder i -ene. |
Tapsløs-join-egenskapen | Må kunne komme tilbake til utgangspunktet, og ikke kunne skape falske data. Kan sikres ved å sjekke hver oppdeling med snittregelen, eller bruke tabellmetoden (algoritme 15.3 i læreboka). |
Regler for tapsløs join
Begrep | Forklaring |
---|---|
Tabellmetoden | (algoritme 15.3 i læreboka) |
Snittregelen | Dekomponeringen har tapsløs-join-egenskapen hvis felles attributter (snittet) i og er en supernøkkel i en eller begge tabellene: eller . |
Fagins teorem | La være delmengder av og la . Projeksjonene og har tapsløs-join-egenskapen hvis og bare hvis eller . |
Part 2: Summary
Del 2 kan oppsummeres på følgende måte:
- Lagring og indekser
- Databasearkitektur (SQL compiler, SQL catalog, optimizer, executor, DB buffer)
- Lagrings- og indekseringsmuligheter
- Heapfil
- Hashtabell (statisk og extendible)
- B+-tre (clustered og unclustered)
- Kort om: LSM-tre, Column stores, AI-genererte indekser
- Queryutføring
- Optimalisering vha. datastatistikk
- (Flette-)Sortering
- Metoder for SELECT og JOIN (Nested loop join)
- Transaksjoner
- Samtidighetsproblemer
- The Lost Update Problem, The Temporary Update (or Dirty Read) Problem, The Incorrect Summary Problem, The Unrepeatable Read Problem, Phantoms
- Transaksjonsegenskaper (ACID)
- Historier
- Recovery-egenskap (Unrecoverable, Recoverable, ACA, Strict)
- (Konflikt-)Serialiserbarhet
- Isolasjonsnivåer
- READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE
- Tofaselåsing
- Basic, Conservative, Strict, and Rigorous
- Samtidighetsproblemer
- Recovery
- Write-Ahead Logging, Steal/No-Steal, og Force/No-Force
- Checkpoints
- Multiversjon samtidighetskontroll (MMVC)
- Snapshot Isolation
- Deferred/Immediate update, UNDO/REDO
- ARIES
- Transaction Table
- Dirty Page Table
18 Strategies for Query Processing
The scanner identifies the query tokens—such as SQL keywords, attribute names, and relation names. the parser checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language. the parser checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language. The parser checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language.
The query optimizer module has the task of producing a good execution plan, and the code generator generates the code to execute that plan. The runtime database processor has the task of running (executing) the query code, whether in compiled or interpreted mode, to produce the query result.
Algorithms for External Sorting
Note that sorting of a particular file may be avoided if an appropriate index— such as a primary or clustering index exists on the desired file attribute to allow ordered access to the records of the file.
External sorting refers to sorting algorithms that are suitable for large files of records stored on disk that do not fit entirely in main memory. The typical external sorting algorithm uses a sort-merge strategy, which starts by sorting small subfiles—called runs—of the main file and then merges the sorted runs, creating larger sorted subfiles that are merged in turn.
The buffer space is divided into individual buffers, where each buffer is the same size in bytes as the size of one disk block.
In the sorting phase, runs that can fit in the available buffer space are read into main memory, sorted using an internal sorting algorithm, and written back to disk as temporary sorted subfiles (or runs).
Let be the number of initial runs, the number of file blocks available, and the buffer space. Then
In the merging phase, the sorted runs are merged during one or more merge passes. Each merge pass can have one or more merge steps.The degree of merging is the number of sorted subfiles that can be merged in each merge step.
The number of merge passes is . The following formula approximates the number of disk block reads and writes:
Algorithms for SELECT Operation
Search Methods for Simple Selection.
- S1 Linear search (brute force algorithm). Retrieve every record in the file, and test whether its attribute values satisfy the selection condition.
- S2 Binary search.
- S3a Using a primary index.
- S3b Using a hash key.
- S4 Using a primary index to retrieve multiple records.
- S5 Using a clustering index to retrieve multiple records.
- S6 Using a secondary (B+-tree) index on an equality comparison.
- S7a Using a bitmap index.
- S7b Using a functional index.
Method S1 (linear search) applies to any file. Method S2 (binary search) requires the file to be sorted. The methods that use an index (S3a, S4, S5, and S6) are generally referred to as index searches, and they require the appropriate index to exist. Methods S4 and S6 can be used to retrieve records in a certain range in range queries. Method S7a (bitmap index search) is suitable for retrievals where an attribute must match an enumerated set of values. Method S7b (functional index search) is suitable when the match is based on a function of one or more attributes on which a functional index exists.
Search Methods for Conjunctive Selection
Conjunctive condition—that is, if it is made up of several simple conditions connected with the AND logical connective the DBMS can use the following additional methods to implement the operation:
The DBMS can use the following additional methods to implement the operation:
- S8 Conjunctive selection using an individual index.
- S9 Conjunctive selection using a composite index.
- S10 Conjunctive selection by intersection of record pointers.
Search Methods for Disjunctive Selection
A disjunctive condition (where simple conditions are connected by the OR logical connective rather than by AND) is much harder to process and optimize.
Only if an access path exists on every simple condition in the disjunction can we optimize the selection by retrieving the records satisfying each condition—or their record ids—and then applying the union operation to eliminate duplicates.
Estimating the Selectivity of a Condition
To minimize the overall cost of query execution in terms of resources used and response time, the query optimizer receives valuable input from the system catalog, which contains crucial statistical information about the database.
Information in the Database Catalog. A typical RDBMS catalog contains the following types of information:
- For each relation (table) r with schema R containing rR tuples:
- The number of rows/records or its cardinality: |r(R)|. We will refer to the number of rows simply as rR.
- The “width” of the relation (i.e., the length of each tuple in the relation) this length of tuple is referred to as R.
- The number of blocks that relation occupies in storage: referred to as bR.
- The blocking factor bfr, which is the number of tuples per block.
- For each attribute A in relation R:
- The number of distinct values of A in R: NDV (A, R).
- The max and min values of attribute A in R: max (A, R) and min (A, R).
When the optimizer is choosing between multiple simple conditions in a conjunctive select condition, it typically considers the selectivity of each condition. The selectivity (sl) is defined as the ratio of the number of records (tuples) that satisfy the condition to the total number of records (tuples) in the file (relation).
Estimates of selectivities are possible from the information kept in the DBMS catalog and are used by the optimizer.
Implementing the JOIN Operation
Methods for Implementing Joins
- J1 Nested-loop join (or nested-block join).
- J2 Index-based nested-loop join (using an access structure to retrieve the matching records).
- J3 Sort-merge join.
- J4 Partition-hash join (or just hash-join).
How Buffer Space and Choice of Outer-Loop File Affect Performance of Nested-Loop Join
It is advantageous to read as many blocks as possible at a time into memory from the file whose records are used for the outer loop.
An extra buffer in main memory is needed to contain the resulting records after they are joined, and the contents of this result buffer can be appended to the result file whenever it is filled.
Let be the buffer size and be the number of blocks accessed for inner-loop file and be the number of blocks accessed for the outer-loop file.We get the following total number of block read accesses:
If the result file of the join operation has bRES disk blocks, each block is written once to disk, so an additional bRES block accesses (writes) should be added to the preceding formulas in order to estimate the total cost of the join operation.
Join selection factor: the fraction of records in one file that will be joined with records in the other file. This factor depends on the particular equijoin condition between the two files.
The file with the high join selection factor) should be used in the (single) join loop.
20 Introduction to Transaction Processing Concepts and Theory
Introduction to Transaction Processing
Single-User versus Multiuser Systems
One criterion for classifying a database system is according to the number of users who can use the system concurrently. A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if many users can use the system— and hence access the database—concurrently.
Most of the theory concerning concurrency control in databases is developed in terms of interleaved concurrency.
Transactions, Database Items, Read and Write Operations, and DBMS Buffers
A transaction is an executing program that forms a logical unit of database processing. A transaction includes one or more database access operations—these can include insertion, deletion, modification (update), or retrieval operations.
A database is basically represented as a collection of named data items. The size of a data item is called its granularity. A data item can be a database record, but it can also be a larger unit such as a whole disk block, or even a smaller unit such as an individual field (attribute) value of some record in the database.
The read-set of a transaction is the set of all items that the transaction reads, and the write-set is the set of all items that the transaction writes.
Why Concurrency Control Is Needed
- The Lost Update Problem. when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database items incorrect
- The Temporary Update (or Dirty Read) Problem. This problem occurs when one transaction updates a database item and then the transaction fails for some reason. This problem occurs when one transaction updates a database item and then the transaction fails for some reason
- The Incorrect Summary Problem. If one transaction is calculating an aggregate summary function on a number of database items while other transactions are updating some of these items, the aggregate function may calculate some values before they are updated and others after they are updated.
- The Unrepeatable Read Problem. a transaction T reads the same item twice and the item is changed by another transaction T′ between the two reads.
Why Recovery Is Needed
If a transaction fails after executing some of its operations but before executing all of them, the operations already executed must be undone and have no lasting effect.
Types of Failures. Failures are generally classified as transaction, system, and media failures. There are several possible reasons for a transaction to fail in the middle of execution:
- A computer failure (system crash).
- A transaction or system error.
- Local errors or exception conditions detected by the transaction.
- Concurrency control enforcement.
- Disk failure.
- Physical problems and catastrophes.
Transaction and System Concepts
Transaction States and Additional Operations
A transaction is an atomic unit of work that should either be completed in its entirety or not done at all. For recovery purposes, the system needs to keep track of when each transaction starts, terminates, and commits, or aborts. Therefore, the recovery manager of the DBMS needs to keep track of the following operations:
- BEGIN_TRANSACTION.
- READ or WRITE.
- END_TRANSACTION.
- COMMIT_TRANSACTION.
- ROLLBACK (or ABORT).
A transaction goes into an active state immediately after it starts execution, where it can execute its READ and WRITE operations. When the transaction ends, it moves to the partially committed state. At this point, some types of concurrency control protocols may do additional checks to see if the transaction can be committed or not.
If these checks are successful, the transaction is said to have reached its commit point and enters the committed state. When a transaction is committed, it has concluded its execution successfully and all its changes must be recorded perma nently in the database, even if a system failure occurs.
However, a transaction can go to the failed state if one of the checks fails or if the transaction is aborted during its active state. The terminated state corresponds to the transaction leaving the system.
The System Log
To be able to recover from failures that affect transactions, the system maintains a log6 to keep track of all transaction operations that affect the values of database items, as well as other transaction information that may be needed to permit recovery from failures. The log is a sequential, append-only file that is kept on disk, so it is not affected by any type of failure except for disk or catastrophic failure.
The following are the types of entries—called log records—that are written to the log file and the corresponding action for each log record.
- [start_transaction, T]. Indicates that transaction T has started execution.
- [write_item, T, X, old_value, new_value]. Indicates that transaction T has changed the value of database item X from old_value to new_value.
- [read_item, T, X]. Indicates that transaction T has read the value of database item X.
- [commit, T]. Indicates that transaction T has completed successfully, and affirms that its effect can be committed (recorded permanently) to the database.
- [abort, T]. Indicates that transaction T has been aborted.
Protocols for recovery that avoid cascading rollbacks which include nearly all practical protocols—do not require that READ operations are written to the system log. Additionally, some recovery protocols require simpler WRITE entries that only include one of new_value or old_value instead of including both.
Because the log contains a record of every WRITE operation that changes the value of some database item, it is possible to undo the effect of these WRITE operations of a transaction T by tracing backward through the log and resetting all items changed by a WRITE operation of T to their old_values. Redo of an operation may also be necessary if a transaction has its updates recorded in the log but a failure occurs before the system can be sure that all these new_values have been written to the actual database on disk from the main memory buffers.
Commit Point of a Transaction
A transaction T reaches its commit point when all its operations that access the database have been executed successfully and the effect of all the transaction operations on the database have been recorded in the log. Beyond the commit point, the transaction is said to be committed, and its effect must be permanently recorded in the database. The transaction then writes a commit record [commit, T] into the log.
If a system failure occurs, we can search back in the log for all transactions T that have written a [start_transaction, T] record into the log but have not written their [commit, T] record yet; these transactions may have to be rolled back to undo their effect on the database during the recovery process. Transactions that have written their commit record in the log must also have recorded all their WRITE operations in the log, so their effect on the database can be redone from the log records.
At the time of a system crash, only the log entries that have been written back to disk are considered in the recovery process if the contents of main memory are lost.
before a transaction reaches its commit point, any portion of the log that has not been written to the disk yet must now be written to the disk. This process is called force-writing
DBMS-Specific Buffer Replacement Policies
If all the buffers in the DBMS cache are occupied and new disk pages are required to be loaded into main memory from disk, a page replacement policy is needed to select the particular buffers to be replaced.
Domain Separation (DS) Method. the DBMS cache is divided into separate domains (sets of buffers). Each domain handles one type of disk pages, and page replacements within each domain are handled via the basic LRU (least recently used) page replacement.
Hot Set Method. The hot set method determines for each database processing algorithm the set of disk pages that will be accessed repeatedly, and it does not replace them until their processing is completed.
The DBMIN Method. The DBMIN page replacement policy will calculate a locality set using QLSM for each file instance involved in the query. then allocates the appropriate number of buffers to each file instance involved in the query based on the locality set for that file instance.
Desirable Properties of Transactions
The following are the ACID properties:
- Atomicity. A transaction is an atomic unit of processing; it should either be performed in its entirety or not performed at all.
- Consistency preservation. if it is completely executed from beginning to end without interference from other transactions, it should take the database from one consistent state to another.
- Isolation. A transaction should appear as though it is being executed in isolation from other transactions, even though many transactions are executing concurrently.
- Durability. The changes applied to the database by a committed transaction must persist in the database, and not be lost in case of failure.
It is the responsibility of the transaction recovery subsystem of a DBMS to ensure atomicity. If a transaction fails to complete for some reason the recovery technique must undo any effects of the transaction on the database.
The preservation of consistency is generally considered to be the responsibility of the programmers. A consistent state of the database satisfies the constraints specified in the schema as well as any other constraints on the database that should hold.
The isolation property is enforced by the concurrency control subsystem of the DBMS.
The durability property is the responsibility of the recovery subsystem of the DBMS.
There have been attempts to define the level of isolation of a transaction. A transaction is said to have level 0 (zero) isolation if it does not overwrite the dirty reads of higher-level transactions. Level 1 (one) isolation has no lost updates, and level 2 isolation has no lost updates and no dirty reads. Finally, level 3 isolation (also called true isolation) has, in addition to level 2 properties, repeatable reads.9 Another type of isolation is called snapshot isolation.
Characterizing Schedules Based on Recoverability
Schedules (Histories) of Transactions
A schedule (or history) S of n transactions T1, T2, … , Tn is an ordering of the operations of the transactions. Operations from different transactions can be interleaved in the schedule S. However, for each transaction Ti that participates in the schedule S, the operations of Ti in S must appear in the same order in which they occur in Ti.
symbols b, r, w, e, c, and a for the operations begin_transaction, read_item, write_item, end_transaction, commit, and abort,
Two operations in a schedule are said to conflict if they satisfy all three of the following conditions: (1) they belong to different transactions; (2) they access the same item X; and (3) at least one of the operations is a write_item(X). Intuitively, two operations are conflicting if changing their order can result in a different outcome.
a schedule is a partial order of the operations in the n transactions. However, a total order must be specified in the schedule for any pair of conflicting operations and for any pair of operations from the same transaction.
Characterizing Schedules Based on Recoverability
it is important to characterize the types of schedules for which recovery is possible, as well as those for which recovery is relatively simple.
once a transaction T is committed, it should never be necessary to roll back T. This ensures that the durability property of transactions is not violated. The schedules that theoretically meet this criterion are called recoverable schedules. A schedule where a committed transaction may have to be rolled back during recovery is called nonrecoverable and hence should not be permitted by the DBMS. The condition for a recoverable schedule is as follows: A schedule S is recoverable if no transaction T in S commits until all transactions T′ that have written some item X that T reads have committed. A transaction T reads from transaction T′ in a schedule S if some item X is
if sufficient information is kept (in the log), a recovery algorithm can be devised for any recoverable schedule.
it is possible for a phenomenon known as cascading rollback (or cascading abort) to occur in some recoverable schedules, where an uncommitted transaction has to be rolled back because it read an item from a transaction that failed.
A schedule is said to be cascadeless, or to avoid cascading rollback, if every transaction in the schedule reads only items that were written by committed transactions.
a strict schedule, in which transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted). In a strict schedule, the process of undoing a write_item(X) operation of an aborted transaction is simply to restore the before image
It is important to note that any strict schedule is also cascadeless, and any cascadeless schedule is also recoverable.
Characterizing Schedules Based on Serializability
Now we characterize the types of schedules that are always considered to be correct when concurrent transactions are executing. Such schedules are known as serializable schedules.
Serial, Nonserial, and Conflict-Serializable Schedules
Formally, a schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule; otherwise, the schedule is called nonserial.
The problem with serial schedules is that they limit concurrency by prohibiting interleaving of operations. serial schedules are unacceptable in practice.
We would like to determine which of the nonserial schedules always give a correct result and which may give erroneous results. A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions. Saying that a nonserial schedule S is serializable is equivalent to saying that it is correct, because it is equivalent to a serial schedule, which is considered correct.
Two schedules are called result equivalent if they produce the same final state of the database. However, two different schedules may accidentally produce the same final state.
Two schedules are said to be conflict equivalent if the relative order of any two conflicting operations is the same in both schedules.
Using the notion of conflict equivalence, we define a schedule S to be serializable if it is (conflict) equivalent to some serial schedule S′.
Testing for Serializability of a Schedule
The algorithm looks at only the read_item and write_item operations in a schedule to construct a precedence graph.
An edge is created by the algorithm if a pair of conflicting operations exist in Tj and Tk and the conflicting operation in Tj appears in the schedule before the conflicting operation in Tk.
How Serializability Is Used for Concurrency Control
Being serializable is distinct from being serial. A serializable schedule gives the benefits of concurrent execution without giving up any correctness.
The approach taken in most commercial DBMSs is to design protocols (sets of rules) that—if followed by every individual transaction or if enforced by a DBMS concurrency control subsystem—will ensure serializability of all schedules in which the transactions participate.
two-phase locking, is based on locking data items to prevent concurrent transactions from interfering with one another.
Transaction Support in SQL
The access mode can be specified as READ ONLY or READ WRITE.
The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n, which indicates the number of conditions that can be held simultaneously in the diagnostic area.
The isolation level option is specified using the statement ISOLATION LEVEL isolation, where the value for isolation can be READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, or SERIALIZABLE.
If a transaction executes at a lower isolation level than SERIALIZABLE, then one or more of the following three violations may occur:
- Dirty read.
- Nonrepeatable read.
- Phantoms.
Snapshot Isolation.
The basic definition of snapshot isolation is that a transaction sees the data items that it reads based on the committed values of the items in the database snapshot (or database state) when the transaction starts. Snapshot isolation will ensure that the phantom record problem does not occur, since the database transaction, or in some cases the database statement, will only see the records that were committed in the database at the time the transaction starts.
21 Concurrency Control Techniques
Two-Phase Locking Techniques for Concurrency Control
A lock is a variable associated with a data item that describes the status of the item with respect to possible operations that can be applied to it.
Types of Locks and System Lock Tables
Shared/Exclusive (or Read/Write) Locks. We should allow several transactions to access the same item X if they all access X for reading purposes only. This is because read operations on the same item by different transactions are not conflicting. There are three locking operations: read_lock(X), write_lock(X), and unlock(X). Three possible states: read-locked, write-locked, or unlocked. A read-locked item is also called share-locked because other transactions are allowed to read the item, whereas a write-locked item is called exclusive-locked because a single transaction exclusively holds the lock on the item.
When we use the shared/exclusive locking scheme, the system must enforce the following rules:
- A transaction T must issue the operation read_lock(X) or write_lock(X) before any read_item(X) operation is performed in T.
- A transaction T must issue the operation write_lock(X) before any write_item(X) operation is performed in T.
- A transaction T must issue the operation unlock(X) after all read_item(X) and write_item(X) operations are completed in T. (rigorous)
- A transaction T will not issue an unlock(X) operation unless it already holds a read (shared) lock or a write (exclusive) lock on item X.
w lock conversion; that is, a transaction that already holds a lock on item X is allowed under certain conditions to convert the lock from one locked state to another. If T is the only transaction holding a read lock on X at the time it issues the write_lock(X) operation, the lock can be upgraded. It is also possible for a transaction T to issue a write_lock(X) and then later to downgrade the lock by issuing a read_lock(X) operation.
To guarantee serializability, we must follow an additional protocol concerning the positioning of locking and unlocking operations in every transaction. The best-known protocol, two-phase locking, is described in the next section.
Guaranteeing Serializability by Two-Phase Locking
A transaction is said to follow the two-phase locking protocol if all locking operations (read_lock, write_lock) precede the first unlock operation in the transaction.
It can be proved that, if every transaction in a schedule follows the two-phase locking protocol, the schedule is guaranteed to be serializable, obviating the need to test for serializability of schedules.
Two-phase locking may limit the amount of concurrency that can occur in a schedule.
Basic, Conservative, Strict, and Rigorous Two-Phase Locking.
The technique just described is known as basic 2PL.
A variation known as conservative 2PL (or static 2PL) requires a transaction to lock all the items it accesses before the transaction begins execution, by predeclaring its read-set and write-set. Conservative 2PL is a deadlock-free protocol.
the most popular variation of 2PL is strict 2PL, which guarantees strict schedules. In this variation, a transaction T does not release any of its exclusive (write) locks until after it commits or aborts. Strict 2PL is not deadlock-free.
A more restrictive variation of strict 2PL is rigorous 2PL, which also guarantees strict schedules. In this variation, a transaction T does not release any of its locks (exclusive or shared) until after it commits or aborts, and so it is easier to implement than strict 2PL.
Dealing with Deadlock and Starvation
Locking is generally considered to have a high overhead, because every read or write operation is preceded by a system locking request. The use of locks can also cause two additional problems: deadlock and starvation.
Deadlock occurs when each transaction T in a set of two or more transactions is waiting for some item that is locked by some other transaction T′ in the set.
One way to prevent deadlock is to use a deadlock prevention protocol. These protocols are not generally used in practice, either because of unrealistic assumptions or because of their possible overhead. Some schemes that prevent deadlock are wait-die, wound-wait and cautious waiting.
An alternative approach to dealing with deadlock is deadlock detection. A simple way to detect a state of deadlock is for the system to construct and maintain a wait-for graph.
Another simple scheme to deal with deadlock is the use of timeouts. In this method, if a transaction waits for a period longer than a system-defined timeout period, the system assumes that the transaction may be deadlocked and aborts it—regardless of whether a deadlock actually exists.
Starvation, which occurs when a transaction cannot proceed for an indefinite period of time while other transactions in the system continue normally. This may occur if the waiting scheme for locked items is unfair in that it gives priority to some transactions over others. One solution for starvation is to have a fair waiting scheme, such as using a first-come-first-served queue. Another scheme allows some transactions to have priority over others but increases the priority of a transaction the longer it waits. The wait-die and wound-wait schemes discussed previously avoid starvation, because they restart a transaction that has been aborted with its same original timestamp
Multiversion Concurrency Control Techniques
These protocols for concurrency control keep copies of the old values of a data item when the item is updated (written); they are known as multiversion concurrency control because several versions (values) of an item are kept by the system. An obvious drawback of multiversion techniques is that more storage is needed to maintain multiple versions of the database items.
One reason for keeping multiple versions is that some read operations that would be rejected in other techniques can still be accepted by reading an older version of the item to maintain serializability.
The extreme case is a temporal database which keeps track of all changes and the times at which they occurred.
The snapshot isolation technique can also be implemented by keeping older versions of items in a temporary store.
22 Database Recovery Techniques
Recovery Concepts
Recovery Outline and Categorization of Recovery Algorithms
Recovery from transaction failures usually means that the database is restored to the most recent consistent state before the time of failure. To do this, the system must keep information about the changes that were applied to data items by the various transactions. This information is typically kept in the system log.
- If there is extensive damage to a wide portion of the database due to catastrophic failure, such as a disk crash, the recovery method restores a past copy of the database that was backed up to archival storage and reconstructs a more current state by reapplying or redoing the operations of committed transactions from the backed-up log, up to the time of failure.
- When the database on disk is not physically damaged, the recovery strategy is to identify any changes that may cause an inconsistency in the database. For example, a transaction that has updated some database items on disk but has not been committed needs to have its changes reversed by undoing its write operations. It may also be necessary to redo some operations in order to restore a consistent state of the database; for example, if a transaction has committed but some of its write operations have not yet been written to disk. The entries kept in the online system log on disk are analyzed to determine the appropriate actions for recovery.
Conceptually, we can distinguish two main policies for recovery from noncatastrophic transaction failures: deferred update and immediate update. The deferred update techniques do not physically update the database on disk until after a transaction commits; then the updates are recorded in the database. deferred update is also known as the NO-UNDO/REDO algorithm.
In the immediate update techniques, the database may be updated by some operations of a transaction before the transaction reaches its commit point. However, these operations must also be recorded in the log on disk by force-writing before they are applied to the database on disk, making recovery still possible. In the general case of immediate update, both undo and redo may be required during recovery. This technique, known as the UNDO/REDO algorithm, requires both operations during recovery and is used most often in practice. A variation of the algorithm where all updates are required to be recorded in the database on disk before a transaction commits requires undo only, so it is known as the UNDO/NO-REDO algorithm.
The UNDO and REDO operations are required to be idempotent—that is, executing an operation multiple times is equivalent to executing it just once. In fact, the whole recovery process should be idempotent, in case the system were to fail during the recovery process.
Caching (Buffering) of Disk Blocks
Typically, multiple disk pages that include the data items to be updated are cached into main memory buffers and then updated in memory before being written back to disk.
Associated with each buffer in the cache is a dirty bit, which can be included in the directory entry to indicate whether or not the buffer has been modified. When the buffer contents are replaced (flushed) from the cache, the contents must first be written back to the corresponding disk page only if its dirty bit is 1.
the pin-unpin bit, is also needed—a page in the cache is pinned (bit value 1 (one)) if it cannot be written back to disk as yet. For example, the recovery protocol may restrict certain buffer pages from being written back to the disk until the transactions that changed this buffer have committed.
Two main strategies can be employed when flushing a modified buffer back to disk. in-place updating, writes the buffer to the same original disk location, thus overwriting the old value of any changed data items on disk. shadowing, writes an updated buffer at a different disk location, so multiple versions of data items can be maintained, but this approach is not typically used in practice.
In general, the old value of the data item before updating is called the before image (BFIM), and the new value after updating is called the after image (AFIM).
Write-Ahead Logging, Steal/No-Steal, and Force/No-Force
When in-place updating is used, it is necessary to use a log for recovery. the recovery mechanism must ensure that the BFIM of the data item is recorded in the appropriate log entry and that the log entry is flushed to disk before the BFIM is overwritten with the AFIM in the database on disk. This process is generally known as write-ahead logging and is necessary so we can UNDO the operation if this is required during recovery.
we need to distinguish between two types of log entry information included for a write command: the information needed for UNDO and the information needed for REDO. A REDO-type log entry includes the new value (AFIM) of the item written by the operation. The UNDO-type log entries include the old value (BFIM) of the item since this is needed to undo the effect of the operation. In an UNDO/REDO algorithm, both BFIM and AFIM are recorded into a single log entry.
The log is simply a sequential (appendonly) disk file
Standard DBMS recovery terminology includes the terms steal/no-steal and force/no-force, which specify the rules that govern when a page from the database cache can be written to disk:
Steal/No-Steal. If a cache buffer page updated by a transaction cannot be written to disk before the transaction commits, the recovery method is called a no-steal approach. if the recovery protocol allows writing an updated buffer before the transaction commits, it is called steal. Steal is used when the DBMS cache (buffer) manager needs a buffer frame for another transaction and the buffer manager replaces an existing page that had been updated but whose transaction has not committed. The no-steal rule means that UNDO will never be needed during recovery, since a committed transaction will not have any of its updates on disk before it commits.
Force/No-Force. If all pages updated by a transaction are immediately written to disk before the transaction commits, the recovery approach is called a force approach. Otherwise, it is called no-force. The force rule means that REDO will never be needed during recovery, since any committed transaction will have all its updates on disk before it is committed.
The deferred update (NO-UNDO) recovery scheme discussed in Section 22.2 follows a no-steal approach. However, typical database systems employ a steal/no-force (UNDO/REDO) strategy. The advantage of steal is that it avoids the need for a very large buffer space to store all updated pages in memory. The advantage of no-force is that an updated page of a committed transaction may still be in the buffer when another transaction needs to update it, thus eliminating the I/O cost to write that page multiple times to disk and possibly having to read it again from disk. This may provide a substantial saving in the number of disk I/O operations when a specific page is updated heavily by multiple transactions.
Checkpoints in the System Log and Fuzzy Checkpointing
A [ checkpoint, list of active transactions ] record is written into the log periodically at that point when the system writes out to the database on disk all DBMS buffers that have been modified. all transactions that have their [ commit, T ] entries in the log before a [ checkpoint ] entry do not need to have their WRITE operations redone in case of a system crash,
Taking a checkpoint consists of the following actions:
- Suspend execution of transactions temporarily.
- Force-write all main memory buffers that have been modified to disk.
- Write a [ checkpoint ] record to the log, and force-write the log to disk.
- Resume executing transactions.
fuzzy checkpointing. In this technique, the system can resume transaction processing after a [begin_checkpoint] record is written to the log without having to wait for step 2 to finish. When step 2 is completed, an [end_checkpoint, … ] record is written in the log with the relevant information collected during checkpointing.
Transaction Rollback and Cascading Rollback
f a transaction fails for whatever reason after updating the database, but before the transaction commits, it may be necessary to roll back the transaction. The undo-type log entries are used to restore the old values of data items that must be rolled back.
If a transaction T is rolled back, any transaction S that has, in the interim, read the value of some data item X written by T must also be rolled back. This phenomenon is called cascading rollback, and it can occur when the recovery protocol ensures recoverable schedules but does not ensure strict or cascadeless schedules. almost all recovery mechanisms are designed so that cascading rollback is never required.
NO-UNDO/REDO Recovery Based on Deferred Update
The idea behind deferred update is to defer or postpone any actual updates to the database on disk until the transaction completes its execution successfully and reaches its commit point. deferred update can generally be characterized as a no-steal approach.
During transaction execution, the updates are recorded only in the log and in the cache buffers.If a transaction fails before reaching its commit point, there is no need to undo any operations because the transaction has not affected the database on disk in any way. Therefore, only REDOtype log entries are needed in the log, which include the new value (AFIM).
it cannot be used in practice unless transactions are short and each transaction changes few items. For other types of transactions, there is the potential for running out of buffer space
it is only necessary to REDO the last update of X from the log during recovery because the other updates would be overwritten by this last REDO.
Recovery Techniques Based on Immediate Update
In these techniques, when a transaction issues an update command, the database on disk can be updated immediately
Provisions must be made for undoing the effect of update operations that have been applied to the database by a failed transaction. This is accomplished by rolling back the transaction and undoing the effect of the transaction’s write_item operations. Therefore, the UNDO-type log entries, which include the old value (BFIM) of the item, must be stored in the log. Because UNDO can be needed during recovery, these methods follow a steal strategy.
Theoretically, we can distinguish two main categories of immediate update algorithms.
- UNDO/NO-REDO recovery algorithm. In this method, all updates by a transaction must be recorded on disk before the transaction commits, so that REDO is never needed. steal/force strategy.
- If the transaction is allowed to commit before all its changes are written to the database, we have the most general case, known as the UNDO/REDO recovery algorithm. the steal/no-force strategy is applied
Shadow Paging
When a transaction begins executing, the current directory—whose entries point to the most recent or current database pages on disk—is copied into a shadow directory. The shadow directory is then saved on disk while the current directory is used by the transaction.
During transaction execution, the shadow directory is never modified. When a write_item operation is performed, a new copy of the modified database page is created, but the old copy of that page is not overwritten.
To recover from a failure during transaction execution, it is sufficient to free the modified database pages and to discard the current directory. The state of the database before transaction execution is available through the shadow directory
Committing a transaction corresponds to discarding the previous shadow directory.
this technique can be categorized as a NO-UNDO/NO-REDO technique for recovery.
In a multiuser environment with concurrent transactions, logs and checkpoints must be incorporated into the shadow paging technique. if the directory is large, the overhead of writing shadow directories to disk as transactions commit is significant.
The ARIES Recovery Algorithm
ARIES uses a steal/no-force approach for writing, and it is based on three concepts: write-ahead logging, repeating history during redo, and logging changes during undo. Repeating history, means that ARIES will retrace all actions of the database system prior to the crash to reconstruct the database state when the crash occurred. Transactions that were uncommitted at the time of the crash (active transactions) are undone. Logging during undo, will prevent ARIES from repeating the completed undo operations if a failure occurs during recovery,
The ARIES recovery procedure consists of three main steps: analysis, REDO, and UNDO. The analysis step identifies the dirty (updated) pages in the buffer6 and the set of transactions active at the time of the crash. The appropriate point in the log where the REDO operation should start is also determined. The REDO phase actually reapplies updates from the log to the database. Generally, the REDO operation is applied only to committed transactions. However, this is not the case in ARIES.
information stored by ARIES and in the data pages will allow ARIES to determine whether the operation to be redone has actually been applied to the database and therefore does not need to be reapplied. Thus, only the necessary REDO operations are applied during recovery. Finally, during the UNDO phase, the log is scanned backward and the operations of transactions that were active at the time of the crash are undone in reverse order. The information needed for ARIES to accomplish its recovery procedure includes the log, the Transaction Table, and the Dirty Page Table.
In ARIES, every log record has an associated log sequence number (LSN). Each LSN corresponds to a specific change (action) of some transaction. Also, each data page will store the LSN of the latest log record corresponding to a change for that page. A log record is written for any of the following actions: updating a page (write), committing a transaction (commit), aborting a transaction (abort), undoing an update (undo), and ending a transaction (end).
Common fields in all log records include the previous LSN for that transaction, the transaction ID, and the type of log record. The previous LSN is important because it links the log records (in reverse order) for each transaction.
In addition to the log, two tables are needed for efficient recovery: the Transaction Table and the Dirty Page Table, which are maintained by the transaction manager. When a crash occurs, these tables are rebuilt in the analysis phase of recovery. The Transaction Table contains an entry for each active transaction, with information such as the transaction ID, transaction status, and the LSN of the most recent log record for the transaction. The Dirty Page Table contains an entry for each dirty page in the DBMS cache, which includes the page ID and the LSN corresponding to the earliest update to that page.
Checkpointing in ARIES consists of the following: writing a begin_checkpoint record to the log, writing an end_checkpoint record to the log, and writing the LSN of the begin_checkpoint record to a special file. This special file is accessed during recovery to locate the last checkpoint information. With the end_checkpoint record, the contents of both the Transaction Table and Dirty Page Table are appended to the end of the log. To reduce the cost, fuzzy checkpointing is used so that the DBMS can continue to execute transactions during checkpointing
During analysis, the log records being analyzed may cause modifications to these two tables. For instance, if an end log record was encountered for a transaction T in the Transaction Table, then the entry for T is deleted from that table. If some other type of log record is encountered for a transaction T′, then an entry for T′ is inserted into the Transaction Table, if not already present, and the last LSN field is modified.
The REDO phase follows next. To reduce the amount of unnecessary work, ARIES starts redoing at a point in the log where it knows (for sure) that previous changes to dirty pages have already been applied to the database on disk. It can determine this by finding the smallest LSN, M, of all the dirty pages in the Dirty Page Table, which indicates the log position where ARIES needs to start the REDO phase.
For each change recorded in the log, the REDO algorithm would verify whether or not the change has to be reapplied. For example, if a change recorded in the log pertains to page P that is not in the Dirty Page Table, then this change is already on disk and does not need to be reapplied. Or, if a change recorded in the log (with LSN = N, say) pertains to page P and the Dirty Page Table contains an entry for P with LSN greater than N, then the change is already present. If neither of these two conditions holds, page P is read from disk and the LSN stored on that page, LSN(P), is compared with N. If N < LSN(P), then the change has been applied and the page does not need to be rewritten to disk.
Once the REDO phase is finished, the database is in the exact state that it was in when the crash occurred. Now, the UNDO phase proceeds by scanning backward from the end of the log and undoing the appropriate actions. A compensating log record is written for each action that is undone.
Since a checkpoint has occurred, the address of the associated begin_checkpoint record is retrieved, which is location 4. The analysis phase starts from location 4 until it reaches the end.
For the REDO phase, the smallest LSN in the Dirty Page Table is 1. Hence the REDO will start at log record 1 and proceed with the REDO of updates. The LSNs {1, 2, 6, 7} corresponding to the updates for pages C, B, A, and C, respectively, are not less than the LSNs of those pages (as shown in the Dirty Page Table). So those data pages will be read again and the updates reapplied from the log
UNDO is applied only to the active transaction T3. The UNDO phase starts at log entry 6 (the last update for T3) and proceeds backward in the log. The backward chain of updates for transaction T3 (only log record 6 in this example) is followed and undone.