Select Appropriate Data type
I think the selection of the data type for a column can have the most important role in database performance. If we select an appropriate data type then it will reduce the space and enhance the performance otherwise it generates the worst effect. So select an appropriate data type according to the requirements. SQL contains many data types that can store the same type of data but select an appropriate data type because each data type has some limitations and advantages upon another data type.
The following are some guidelines about the selection of data type:
Never use a nvarchar or nchar, instead use a varchar or char because nvarchar and nchar takes a double amount of memory compared to varchar and char. Use nchar and nvarchar when we must store Unicode or 16-bit character data such as Hindi characters.
Avoid use of the text data type instead of varchar because the performance of a varchar is much better than text. Use the text data type when we must store text data of more than 8000 characters.
Never use Select * Statement
When we require all the columns of a table then we usually use a “Select *” statement but this is not a good approach because when we use the “select *” statement then SQL Server convert * into all column names before executing the query and this approach takes some extra time and effort. So always provide all the column names in the query instead of “select *”.
Select Appropriate choice between Exist and IN
I find that many programmers often suggest use Exist instead of IN but I am not satisfied with those people because it is not always correct that we use Exist instead of IN query. It mainly depends upon the amount of data. EXISTS provides the answer in the form of TRUE or FALSE & IN return values.
Now the question develop of which is faster, IN or EXISTS?
It totally depends upon the query. For some queries Exist is efficient and for some IN is efficient.
Exits is a correlated sub-query in which the outer query runs first and for each outer query an inner query is probed.
Whereas in IN the sub-query is evaluated, distinct, indexed and then joined to the original table. So consider a one big table (say 1 million rows) and one small table containing relatively fewer rows.
So if the outer table is small then it will be probed a fewer number of times and with the inner query (a big table) Exist will be faster.
If the inner query table (small table) is giving a smaller result set than IN will be faster. EXISTS will find the first row faster in general than the IN will and the IN will get the LAST row (all rows) faster then the where exists.
The recommendation at that time was:
If the majority of the filtering criteria is in the subquery, use IN.
If the majority of the filtering criteria is in the main query, use EXISTS.
In other words:
IN for a large outer query and small inner query.
EXISTS for small outer query and big inner query.
Never use Count(*) or Count(1)
We should never use Count(*) or Count(1) in SQL . Instead of this we should use Count (Col_Name) .
Exception plan of Count(*), Count(1), Count (Col_Name) are the same.
SELECT COUNT( *) FROM Employee
SELECT COUNT( 1) FROM Employee
SELECT COUNT(Emp_IId) FROM Employee
Never Use ” Sp_” for User Define Stored Procedure
Most programmers use “sp_” for user-defined Stored Procedures. I suggest to never use “sp_” for user-defined Stored Procedure because In SQL Server, the master database has a Stored Procedure with the "sp_" prefix, so when we create a Stored Procedure with the "sp_" prefix then SQL Server always looks first in the master database then in the user-defined database so it takes some extra time.
So we use another prefix for the user-defined Stored Procedure like “usp_” as in the following:
sp_Employees_Insert /* Bad Practice */
usp_Employees_Insert /* Good Practice */
Avoid Cursors
A cursor is a temporary work area created in the system memory when a SQL statement is executed. A cursor is a set of rows together with a pointer that identifies a current row. It is a database object to retrieve data from a result set one row at a time. But use of a cursor is not good because it takes a long time because it fetches data row by row.
So we can use a replacement of cursors. A Temporary table, For or While loop may be a replacement of a cursor in some case.
Use Normalization
In creating a database, normalization is the process of organizing it into tables in such a way that the results of using the database are always unambiguous and as intended. It consists of decomposing tables to eliminate data redundancy and undesirable characteristics like insert, update and delete anamolies.
We should use the following 4 normalization rules in the creation of a database:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce & Codd Normal Form (BCNF)
Use Try–Catch
In T-SQL a Try-Catch block is very important for exception handling. A best practice and use of a Try-Catch block in SQL can save our data from undesired changes. We can put all T-SQL statements in a TRY BLOCK and the code for exception handling can be put into a CATCH block.
Use Exception Handling for the following:
In Transaction Management for Rollback the transaction
When using Cursor in SQL Server
When implement DML Query (Insert,Update,Delete) for checking of error and handle them.
Example
Begin Transaction Trans
Begin Try
Delete From Employee Where Employee.Emp_IID<3
Update Employee Set Employee.First_Name='Pankaj kumar' Where Employee.Emp_IID='6th' /* Error Will Occur Here */
If @@TranCount>0
begin Commit Transaction Trans
End
End Try
Begin Catch
if @@TranCount>0
Print 'Error Is Occur in Transaction'
begin Rollback Transaction Trans /* RollBack Occur */
End
End Catch
Avoid Multiple Joins
Try to avoid writing a SQL query using multiple joins that includes outer joins, cross apply, outer apply . It reduce the speed of execution and reduces the choices for Optimizer to decide the join order and join type. We can use temp table or temp variables instead of Multiple Joins.
Create and Use the Index
An index is a data structure to retrieve fast data. Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. Mainly an index increases the speed of data retrieval.
There are the following two types of indexes in SQL.
Clustered Index
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore a table can have only one clustered index and this is usually made on the primary key. The leaf nodes of a clustered index contain the data pages.
Non-Clustered Index
A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table's rows containing the data. A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk.
Create Nonclustered Index NonClusteredIndex
on Employee_Detail(Emp_Name,Emp_Age) /* Non Clustered Index */
Create Clustered Index My_ClusteredIndex
on Employee_Detail(Emp_IId) /* Clsutered Index */
We should use the index in the table.
Use Primary Key
We should create a primary key on each table. It has two benefits. The first is that it uniquely identifies each record in the table and second it creates a clustered index that stores the data in the form of a B-Tree. So due to this approach the retrieval of the data is very fast.
Column Level
Create Table Table_Name
(
Column_Name Datatype Constraint Constraint_Name Primary Key,
)
Example
Create Table Employee
(
IId int constraint Const_primary_IId primary key,
Name nvarchar(50)
)
Table Level
Alter Table Table_Name
Add constraint Constraint_Name Primary Key(Column_Name)
Example
Alter Table Employee
Add constraint Constraint_Name Primary Key(Emp_No,Salary)
Use Alias NameAliasing renames a table or a column temporarily by giving another name. The use of table aliases means to rename a table in a specific SQL statement. Using aliasing, we can provide a small name to large name that will save our time.
SELECT Employee.Emp_IId,Employee.First_Name,Employee.Last_Name ,Employee.Salary FROM Employee /*Bad Practice */
SELECT e.Emp_IId,e.First_Name,e.Last_Name,e.Salary FROM Employee e /* Good Practice */
Use Stored Procedure
We should be use a Stored Procedure for repeated data. Stored Procedures are compiled once and stored in executable form, so procedure calls are quick and efficient. Executable code is automatically cached and shared among users. The code for any repetitious database operation is the perfect candidate for encapsulation in procedures. This eliminates needless rewrites of the same code, decreases code inconsistency and allows the code to be accessed and executed by any user or application possessing the necessary permissions.
Use Transaction Management
A transaction is a unit of work performed against the database. A transaction is a set of work (T-SQL statements) that execute together like a single unit in a specific logical order as a single unit. If all the statements are executed successfully then the transaction is complete and the transaction is committed and the data will be saved in the database permanently. If any single statement fails then the entire transaction will fail and then the complete transaction is either cancelled or rolled back.
A transaction mainly contains 4 properties that are also known as ACID rules.
Atomicity: Atomic means that all the work in the transaction is treated as a single unit.
Consistency: Transaction ensures that the database properly changes states upon a successfully committed transaction.
Isolation: It ensure that transactions to operate independently and transparent to each other.
Durability: It ensures that the effect of a committed transaction will be saved in the database permanently and should persist no matter what (like due to power failure or something).
Begin Transaction in SQL Server
In the next article we will learn some other concepts to increase the performance of a SQL Server database.