SQL Server

Friday, 20 January 2017

Memory and Indexes

Generally in a computer, Data is stored on the disc in the form of zeros and ones (1, 0). The disc is either made with non-magnetic material, which is coated with a thin layer of magnetic material (CD/DVD’s) or solid-state drives (Memory Cards/HDD). In the magnetic disks data is stored by magnetizing the thin film and spinning the disk. And in the solid-state drives data is stored eclectically. BIT is a basic storage area on the disc. A bit can store either zero or one (0 or 1). If that area (or bit) electrically ON then it is treated as 1 else treated as 0. Each and every character from the computer keyboard internally converted to ASCII code, and next to machine understandable (binary) language i.e. 1's and 0's. Operating system is a set of programs, that can convert High-level language (Human understandable language e.g. English, C) to Low-level language (Machine understandable language i.e. binary) and vice versa.

For example,

8 bits = 1 byte

1024 bytes = 1 KB (Kilo Bytes)

1024 KB’s = 1 MB (Mega Byte)

1024 MB’s = 1 GB (Giga Byte)

1024 GB’s = 1 TB (Terabytes)

1024 TB’s = 1 PB (Peta byte)

1024 PB’s = 1 EB (Exabyte)

Data in SQL Server is arranged in the form of Pages and Extents.

A Page is basic storage area in SQL Server, having 8 KB size on the disc, in that 96 bytes allocated to page header, and which contains information about the page like page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page etc.

A table is a collection of Rows and Columns. When you insert Rows (logical sequence) in to a table those rows internally spread over to multiple pages (physical sequence) on the disk. A row stores all respective columns either by value or by reference. One or more rows are sequentially packed into a page, once the page is full continues to next page. Like this as many as possible rows are fit into a page. When a row needs to be accessed, the respective page copied into buffer memory. Here the deviation between logical and physical sequence is treated as 'External Fragmentation'. And the percentage of empty space within a page treated as 'Internal Fragmentation' of that page.

Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.

INDEX

When there are thousands of records in a table, retrieving data will take a long time. By using INDEX we can increase the performance of retrieving data from database.
Indexes speed up the querying process by providing fast access to rows in the data tables, similarly to the way a book’s index helps you find information quickly within that book.
Therefore indexes are created on columns which are accessed frequently, so that the information can be retrieved quickly.
Indexes can be created on a single column or a group of columns.

HEAP: A table that does not have an Index is called a HEAP.

Indexes are introduced in SQL Server to find or modify requested data in minimum amount of time using few system resources to achieve max performance. In general SQL Server has only two possible ways to access the requested data from the database.

It can scan each and every row from the beginning to end, if it meet the requested criteria then it fetch the requested data.
Or if an index available, it uses that index to locate the requested data.

Indexes took the space on the disk and must be kept in sync with tables. SQL Server indexes are implemented a type of B-TREE structure. The B-TREE architecture consist ROOT level, NON-LEAF level and LEAF level.

Every Index has a single ROOT level on top used as starting point for traversing an index.
All the index levels above the LEAF level including the ROOTs known as the NON-LEAF levels.
The LEAF level is the bottom level of the index structure it contains Key-Value index entries that either reference the rows in the data pages or complete data rows.

SQL Server supports two types of Indexes, Clustered and non-Clustered. Both are having the similar B-Tree structure.

Clustered index: Here the LEAF level has the actual data pages from the table sorted in ascending order. The order of the values in the index pages also in ascending. There can be only one clustered index per table, and might be on one or more columns. SQL Server internally maintains uniqueness of key values for each row even if the column data is not unique. For example in a Clustered Index column data inserted ‘RADHA’ as first instance, here all key values are unique. When subsequent values of ‘RADHA’ are inserted again SQL Server generates an internal number to maintain uniqueness of key-value pair.

Non clustered index: Non clustered index has the same B-Tree structure as clustered index with a difference. The LEAF level of Non-Clustered index contains key values instead of actual data. These key-values are pointed to Clustered Index keys (if that table has the Clustered Index) or the rows in the data pages.

The implementation a Non-Clustered index depends on whether the data pages of a table as a HEAP, or as a Clustered Index. It means that if the table does not have a Clustered Index the LEAF level pointed to rows in the data pages. If the table has a Clustered Index SQL Server builds the Non-Clustered Index on top on Clustered Index so that Non-Clustered LEAF level pages use the Clustered Index.

(Continue reading on next post…)

Friday, 30 December 2016

ISOLATION LEVELS

The thumb rule of RDBMS is every transaction should satisfy the ACID properties. Here I stands for Isolation, means isolate from other simultaneous transactions. (Do protected by or not disturb other transactions). This is achieved by putting the locks on your transaction.

First of all we’ll try to understand what is the lock? In short a lock is a In-Memory structure having 64 bytes(on 32 bit OS) or 128 bytes(on 64 bit OS) in size with different attributes like owner, type, resource hash etc and it links to the actual resource(row, page, table, file, database, etc). SQL Server supports more than 20 different lock types but below are the most frequently used and important ones.

i. Shared Lock (S): If any resource is being held by shared lock through a first transaction, it allows all other transactions only to read (SELECT) operations. But other transactions can’t modify the data until the shared lock (S) exists on the resource. And it released when the read operation completes, unless the transaction isolation level is set to repeatable read or etc. Shared locks can compatible with other shared locks, so that a resource (data) can hold multiple shared locks at a time.

ii. Exclusive Lock (X): If any resource is being held by Exclusive lock through a first transaction, no other transaction can read (SELECT) or modify (INSERT, UPDATE, DELETE) the data. If you want to read use NOLOCK hint or read uncommitted isolation level. These are not compatible with any other locks including shared locks.

iii. Update Lock (U): For any DML operation first it read the data next do the modification. Reading a resource the transaction applies a shared lock (S) and next converts it to exclusive lock (X) for modification. But more than one transactions acquire shared locks (S) on a same resource one can attempt lock conversion to exclusive (X) must wait until existed shared locks release because exclusive lock not compatible with any other locks and it leads to blocking. This time if another shared lock attempts to resource as covert to exclusive lock leads to deadlocks. To prevent such scenarios update locks (U) are used. It can be placed on a resource when that already has a shared lock. It is used by a resource when they are selected for update, and compatible only with shared locks. For example a transaction having update data under a specific condition. SQL Server acquires update lock (U) for every row it processes while checking the condition. When eligible row found, SQL Server converts (U) lock to (X).

iv. Schema Lock: When transactions related to schemas, the Schema Locks will be triggered. For example when performing DDL operations, such as adding a column or dropping a table DB Engine uses Schema-Modification (Sch-M) lock for preventing access on the table. Means it block all other transactions related to this table until the lock released. And when compiling and executing queries DB Engine uses Schema-Stability (Sch-S) locks and it do not block any transnational locks.

v. Intent Locks (IS, IX, IU, etc): By using this type of locks SQL Server shows multigranular locking behavior, which mean locking at multiple levels. Intent Locks are technically not needed by SQL Server, because they only indicate if there is some other specific Lock Type somewhere else within the Lock Hierarchy. What this means is that before you hold a lock on the page or the row level an intent lock is set on the table. This prevents other transactions from putting exclusive locks on the table that would try to cancel the row/page lock. Intent locks include intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX).

But what type of locks and how they can be work decided by the isolation level that you set on database. When you are connecting from a client application (UI) to database, there are different layers (OLEDB Provider for SQL Server, ODBC Driver for SQL Server, or .NET’s SqlClient class) puts these isolation levels programmatically to the database. And also we can set explicitly by SET TRANSACTION ISOLATION LEVEL statement.

SET command is a connection or a session specific, which means it applies only on current session. If you make a new connection (open a new window in the Query Analyzer) it will be back to the default isolation level.

There are five isolation levels defined in SQL Server targeting to different scenarios of use, first four are introduced in SQL Server 2000 and the last one (Snapshot) is from SQL Server 2005.
1. Read Uncommitted: This is the lowest isolation level and do not issues any type of locks. So a transaction can read the data even another transaction not yet committed. This is called Dirty Read.

2. Read Committed: This is the default isolation level on each database. It prevents transactions to read data until the first transaction commits respective data (reads only committed data). Simply it restricts a user from seeing any intermediate, uncommitted data (dirty reads) by using shared lock or row versioning. But it won’t prevent repeatable reads and Phantom reads.
REPEATABLE READS: After the first data read the shared lock will be released and if the other transaction modifies the data (as other transactions can acquire exclusive locks) before a subsequent data read, the subsequent data read will see a different set of data than the previous data read. In other words, In between two successful reads of a same transaction’s same data might be modified by another transaction. It leads different result sets.

3. Repeatable Read: To avoid Repeatable Reads we use this isolation. In this isolation shared locks can prevents data modification by other transactions when the current transaction being hold for READ. And also prevents data read operations when the current transaction holds for UPDATE (reads after committed transaction). But it can’t prevent Phantom Reads.
PHANTOM READS: Data getting changed or new rows can be added by other transactions to current transaction. The current transactions search operation can leads different results.

4. Serializable: This isolation doesn’t allow any other transactions to read (SELECT) or write (INSERT, UPDATE and DELETE) the data until the current transaction completed. As a result, the Serializable isolation level prevents dirty reads, non-repeatable reads, and phantom reads. However, it can have the biggest impact on performance, compared to the other isolation levels. Because it holds a read/write lock for the entire range of respective records.

5. Snapshot: In Read-Committed isolation all respective rows are locked until data gets commit these ensures accuracy but leads degrade in performance (blocking). And in Read-Uncommitted the dirty reads help to faster query executions (higher concurrency) but no accuracy in results. Except Read-Uncommitted, in all isolation's data writers (exclusive lock) block data readers (shared lock) and data readers block data writers. To avoid this Snapshot isolation is introduced. Snapshot isolation levels use a Row Versioning concept; simply it extracts the respective rows (previously committed data) to version store (tempdb) with a time stamp, and allows data read and write operations. Here SQL server will maintains multiple versions of old data from db as long as they needed by SQL server else data removed from the version store.

The difference between Read Committed and Repeatable Read is, in the Read Committed isolation level the shared lock is released once the data gets processed without waiting for transaction commit whereas in Repeatable Read isolation levels the shared lock is held until either by the transaction commits or roll backing. .....(Continuation on next post)

Wednesday, 27 May 2015

Table Partitions

Introduced in SQL Server 2005, before that partitioned views are available. A table partition means “divide a large table physically into one or more file groups in the same database, but logically shown as a single table”.

Advantages:
You can Insert or access the data from a partitioned table quickly and efficiently.
E.g. if you trigger a query, it will perform the actions on respective partitions instead of doing on whole table.

You can perform maintenance operations on one or more partitions more quickly.
E.g. you can perform backup operations on individual file groups or compress the data in one or more partitions.

The query performance will improve.
E.g. when a complex query triggered on a table, instead of involving entire table involves only respective partitions. ..etc

Implementing partition mechanism:
Three components will be involved in a table partition.
1. Partition function
2. Partition scheme
3. Partitioning column

1. Partition Function:
It defines how the rows of a table are divided into multiple file groups based on the values in partitioning column. Here the rows Range will be either LEFT or RIGHT. For example, your table has 3 years of data (1981, 1982, and 1983) and we are dividing the table in to 3 partitions based on the Partitioning column values (HireDate in Employee table). Now your requirement is Partition1 contains the data before 1982, Partition2 contains only 1982, Partition3 contains 1983 and above.

LEFT RANGE: Count the Partitioning column values from LEFT side.

RIGHT RANGE: Count the Partitioning column values from RIGHT side.

2. Partition scheme:

It will group all the file groups and maps Partition Function to all file groups.

3. Partitioning column:

It is a column in the table and it will divide the rows to partitions based on some criteria. For example, Employee table have a column named as “HireDate”, it will divides the rows to partitions based on years. Means Partition1 contains only Employee who joined in 1981, Partition2 contains 1982 and so on.

Example:

 I have created a database Test_DB with 3 data files on 3 file groups. And each file group on different drives

PRIMARY on D drive,

FileGroup2 on E drive,

FileGroup3 on F drive.

CREATE DATABASE [Test_DB]

ON PRIMARY --Default fileGroup ‘Primary’ on D drive

( NAME = N'Test_DB',

FILENAME = N'D:\Radha\Test_DB.mdf' ,

SIZE = 3072KB ,FILEGROWTH = 1024KB ),

FILEGROUP [FileGroup2] --Second fileGroup ‘FileGroup2’ on E drive

(NAME = N'Test_DB1',

FILENAME = N'E:\Radha\Test_DB1.ndf' ,

SIZE = 3072KB ,FILEGROWTH = 1024KB ),

FILEGROUP [FileGroup3] --Third fileGroup ‘FileGroup3’ on F drive

( NAME = N'Test_DB2',

FILENAME = N'F:\Radha\Test_DB2.ndf' ,

SIZE = 3072KB ,FILEGROWTH = 1024KB )

LOG ON ( NAME = N'Test_DB_log', --Log file on T drive

FILENAME = N'T:\Radha\Test_DB_log.ldf' ,

SIZE = 1024KB , FILEGROWTH = 10%)

 I have verified files and file groups information by the below script,

SELECT FG.name as [FileGroupName],DF.name as [DBfileName],DF.physical_name as [DBfilePath]

FROM sys.filegroups FG inner join sys.database_files DF

on FG.data_space_id = DF.data_space_id

WHERE FG.type = 'FG' or DF.type_desc = 'ROWS'

Step1: Create a Partition Function with RIGHT RANGE. So that, the function can divide the rows to respective partitions for each complete year,

Use [Test_DB]

CREATE PARTITION FUNCTION HDFunction (datetime)

AS RANGE RIGHT FOR VALUES ( '1982-01-01', '1983-01-01')

Step2: Create a partition Scheme, it will group all file groups and map the HDFunction to all file groups,

Use [Test_DB]

CREATE PARTITION SCHEME Test_PartitionScheme

AS PARTITION HDFunction

TO ([PRIMARY], [FileGroup2], [FileGroup2])

Step3: Create a partitioned Table, here am going to partition Employee table to 3 files groups on Test_DB as,

Use [Test_DB]

CREATE TABLE [dbo].[Employee]

(EMPNO INT PRIMARY KEY,

ENAME VARCHAR(20),

JOB VARCHAR(20),

MGR INT,

HIREDATE DATETIME,

SAL MONEY,

COMM MONEY,

DEPTNO int);

ON Test_PartitionScheme (HIREDATE);

Step4: Insert values to Employee table with different HIREDATES as,

INSERT INTO [dbo].[Employee] VALUES

(7369, 'SMITH', 'CLERK', 7902, '17-DEC-1980', 800, NULL, 20);