Friday, 20 January 2017

Memory and Indexes



Generally in a computer, Data is stored on the disc in the form of zeros and ones (1, 0). The disc is either made with non-magnetic material, which is coated with a thin layer of magnetic material (CD/DVD’s) or solid-state drives (Memory Cards/HDD). In the magnetic disks data is stored by magnetizing the thin film and spinning the disk. And in the solid-state drives data is stored eclectically.  BIT is a basic storage area on the disc. A bit can store either zero or one (0 or 1). If that area (or bit) electrically ON then it is treated as 1 else treated as 0. Each and every character from the computer keyboard internally converted to ASCII code, and next to machine understandable (binary) language i.e. 1's and 0's. Operating system is a set of programs, that can convert High-level language (Human understandable language e.g. English, C) to Low-level language (Machine understandable language i.e. binary) and vice versa.

For example,

8 bits                 = 1 byte
1024 bytes         = 1 KB (Kilo Bytes)
1024 KB’s        = 1 MB (Mega Byte)
1024 MB’s        = 1 GB (Giga Byte)
1024 GB’s        = 1 TB (Terabytes)
1024 TB’s        = 1 PB (Peta byte)
1024 PB’s        = 1 EB (Exabyte)

Data in SQL Server is arranged in the form of Pages and Extents.

A Page is basic storage area in SQL Server, having 8 KB size on the disc, in that 96 bytes allocated to page header, and which contains information about the page like page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page etc.

A table is a collection of Rows and Columns. When you insert Rows (logical sequence) in to a table those rows internally spread over to multiple pages (physical sequence) on the disk. A row stores all respective columns either by value or by reference. One or more rows are sequentially packed into a page, once the page is full continues to next page. Like this as many as possible rows are fit into a page. When a row needs to be accessed, the respective page copied into buffer memory. Here the deviation between logical and physical sequence is treated as 'External Fragmentation'. And the percentage of empty space within a page treated as 'Internal Fragmentation' of that page.

Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.



 INDEX
  •  When there are thousands of records in a table, retrieving data will take a long time. By using INDEX we can increase the performance of retrieving data from database.
  •  Indexes speed up the querying process by providing fast access to rows in the data tables, similarly to the way a book’s index helps you find information quickly within that book.
  • Therefore indexes are created on columns which are accessed frequently, so that the information can be retrieved quickly.
  •  Indexes can be created on a single column or a group of columns.

 HEAP: A table that does not have an Index is called a HEAP.

 Indexes are introduced in SQL Server to find or modify requested data in minimum amount of time using few system resources to achieve max performance. In general SQL Server has only two possible ways to access the requested data from the database.
    1.  It can scan each and every row from the beginning to end, if it meet the requested criteria then it fetch the requested data.
    2. Or if an index available, it uses that index to locate the requested data.


Indexes took the space on the disk and must be kept in sync with tables. SQL Server indexes are implemented a type of B-TREE structure. The B-TREE architecture consist ROOT level, NON-LEAF level and LEAF level.

  • Every Index has a single ROOT level on top used as starting point for traversing an index.
  • All the index levels above the LEAF level including the ROOTs known as the NON-LEAF levels.
  • The LEAF level is the bottom level of the index structure it contains Key-Value index entries that either reference the rows in the data pages or complete data rows.


SQL Server supports two types of Indexes, Clustered and non-Clustered. Both are having the similar B-Tree structure.

Clustered index: Here the LEAF level has the actual data pages from the table sorted in ascending order. The order of the values in the index pages also in ascending. There can be only one clustered index per table, and might be on one or more columns. SQL Server internally maintains uniqueness of key values for each row even if the column data is not unique. For example in a Clustered Index column data inserted ‘RADHA’ as first instance, here all key values are unique. When subsequent values of ‘RADHA’ are inserted again SQL Server generates an internal number to maintain uniqueness of key-value pair.

Non clustered index: Non clustered index has the same B-Tree structure as clustered index with a difference. The LEAF level of Non-Clustered index contains key values instead of actual data. These key-values are pointed to Clustered Index keys (if that table has the Clustered Index) or the rows in the data pages.

The implementation a Non-Clustered index depends on whether the data pages of a table as a HEAP, or as a Clustered Index. It means that if the table does not have a Clustered Index the LEAF level pointed to rows in the data pages. If the table has a Clustered Index SQL Server builds the Non-Clustered Index on top on Clustered Index so that Non-Clustered LEAF level pages use the Clustered Index.

(Continue reading on next post…)