Friday 26 December 2014

How many rows are in each table?

select object_name(id) table_name, rows
from sys.sysindexes
where xmaxlen in ('89', '92')
--and object_name(id) = 'sometablename'
order by rows --desc
--order by table_name desc

Alert on low disk space

Here’s a script I wrote for SQL2008R2, to run hourly in a sql-job, and email me if disk-space is getting low.

-- space_alert.sql

     declare @trig int, @drive varchar(50), @free varchar(50), @email varchar(300), @operator varchar(50)
     set @trig = 15 set @operator = 'some name'

-- capture free-space to temp-table

     select volume_mount_point drive, cast(sum(available_bytes)*100 / sum(total_bytes) as int) Free
     into #space
     from sys.master_files f
     cross apply sys.dm_os_volume_stats(f.database_id, f.[file_id])
     group by volume_mount_point

-- loop through the table row-by-row

     while (select count(*) from #space) > 0
     begin
          set @drive = (select top 1 drive from #space order by drive)
          set @free = (select top 1 free from #space order by drive)

-- and send email if space low

          if @free < @trig
          begin
               set @email = 'EXECUTE msdb.dbo.sp_notify_operator @name=N''' + @operator + ''', @subject=N''Space Alert (SomeServer ' + @drive + @free + '%)'',@body=N''On someserver The Volume ' + @drive + ' has only ' + @free + '% free space.'''
               exec(@email)
          end

-- then remove current line from table

          delete from #space where drive = @drive
     end

Max/Min Server Memory

To optimize the Max-Server-Memory in a SQL2008r2 production-cluster running on Windows 2008r2 with 32GB of memory, I started PerfMon on the server and monitored Memory/available bytes.

I slowly increased the Max-server-Memory until at 30GB the Available-Bytes dipped below 300MB then reduced it a tad to 29GB.

I set Min-Server-Memory 2GB lower at 27GB.

Hide SQL Instance

I found even though I had the ‘hide’ flag set to ‘yes’ on my Laptop, its SQL instance would become visable on the network after every reboot.

The fix seems to be to change the sql-service properties from ‘automatic’ startup-type to ‘manual’.

After that, whenever I started SQL the ‘hide’ flag would be read and applied correctly.

Dropping all Temp tables

I wanted to drop all my temp tables at the top of a large SELECT statement and already had a query to drop a Named temp-table …

IF OBJECT_ID('tempdb..#Temp2') IS NOT NULL DROP TABLE #Temp2

… so just needed to make that dynamic. Looking at the temp tables …

select * from tempdb.sys.objects where name like '#%'

… I noticed all mine had a bunch of underscores and a GUID appended to the name. And the GUID always contained atleast 5 zero’s. So I was able to filter out just mine with

select name from tempdb.sys.objects where name like '#000%'

… and paste that into a while loop to drop them …

--first drop all temp tables
IF OBJECT_ID('temptables') IS NOT NULL DROP TABLE temptables
select name into temptables from tempdb.sys.objects where name like '#000%'

declare @table varchar(max), @cmd varchar(max)
while (select count(*) from temptables) > 0
begin
 set @table = (select top 1 name from temptables)
 set @cmd = 'IF OBJECT_ID(''tempdb..' + @table + ''') IS NOT NULL DROP TABLE ' + @table
 exec(@cmd)
 delete from temptables where name = @table
end

DROP TABLE temptables

By initially creating a list and then working through it deleting the temp-table and its entry in the list, I ensured there would be no endless looping should it try (and fail) to drop a temp-table I don’t have permissions on.

Empty a Database

To stop the import of ‘already there’ data (twice) I needed to empty every table. Its a very, very good idea to back up your databases before using this code. It empties a whole database in one go!

-- empty database
use SomeDatabaseName -- ** change to subject database first & *** CHECK SERVER NAME **
go
EXEC sp_MSForEachTable 'TRUNCATE TABLE [?]'

--autoshrink.sql

-- list databases where autoshrink is ON
select *
from sys.databases
where is_auto_shrink_on != 0

-- make script to turn autoshrink OFF
SELECT 'ALTER DATABASE [' + name + '] SET AUTO_SHRINK OFF WITH NO_WAIT'
FROM sys.databases
WHERE database_id > 4
and is_auto_shrink_on != 0

CHECK DB found Corruption!

Looking over a new SQL Server I noticed a failing job. It was the ‘Check DB’ part of a maintenance-plan and contained this error message …

Msg 2508, Level 16, State 3, Line 1
The In-row data RSVD page count for object “SomeTable”, index ID 0, partition ID 104586854531027, alloc unit ID 104586854531027 (type In-row data) is incorrect. Run DBCC UPDATEUSAGE.
CHECKDB found 0 allocation errors and 1 consistency errors in table ‘SomeTable’ (object ID 1595868725).

It transpired that this box was once upgraded from SQL-2000 and some steps had been missed.

It was trivial to fix (and not a sign of corruption) …

DBCC CHECKDB WITH DATA_PURITY;
GO
EXEC sp_MSForEachDB 'DBCC UPDATEUSAGE(?);';

Database Restore Msg 3634 Error ’32

I kept getting an odd error message …

Msg 3634, Level 16, State 1, Line 3
The operating system returned the error ’32(The process cannot access the file because it is being used by another process.)’ while attempting ‘RestoreContainer::ValidateTargetForCreation’ on ‘E:\Website.mdf’.
blah blah blah

… while executing this code (simplified) …

RESTORE DATABASE [Website]
FROM DISK = N'E:\backups\Website.bak' WITH FILE = 1,
MOVE N'Website' TO N'D:\Website.mdf',
MOVE N'Website_Indexes' TO N'D:\Website_1.ndf',
MOVE N'Website_Customers' TO N'D:\Website_2.ndf'
MOVE N'Website_log' TO N'D:\Website_3.ldf',
NOUNLOAD, REPLACE, STATS = 1
GO

The solution was to use a double-backslash in the path to the mdf file like so …

RESTORE DATABASE [Website]
FROM DISK = N'E:\\backups\Website.bak' WITH FILE = 1,
MOVE N'Website' TO N'D:\Website.mdf',
MOVE N'Website_Indexes' TO N'D:\Website_1.ndf',
MOVE N'Website_Customers' TO N'D:\Website_2.ndf'
MOVE N'Website_log' TO N'D:\Website_3.ldf',
NOUNLOAD, REPLACE, STATS = 1
GO

I wanted to drop all my temp tables at the top of a large SELECT statement and already had a query to drop a Named temp-table …

IF OBJECT_ID('tempdb..#Temp2') IS NOT NULL DROP TABLE #Temp2

… so just needed to make that dynamic. Looking at the temp tables …

select * from tempdb.sys.objects where name like '#%'

… I noticed all mine had a bunch of underscores and a GUID appended to the name. And the GUID always contained atleast 5 zero’s. So I was able to filter out just mine with

select name from tempdb.sys.objects where name like '#000%'

… and paste that into a while loop to drop them …

--first drop all temp tables
IF OBJECT_ID('temptables') IS NOT NULL DROP TABLE temptables
select name into temptables from tempdb.sys.objects where name like '#000%'

declare @table varchar(max), @cmd varchar(max)
while (select count(*) from temptables) > 0
begin
 set @table = (select top 1 name from temptables)
 set @cmd = 'IF OBJECT_ID(''tempdb..' + @table + ''') IS NOT NULL DROP TABLE ' + @table
 exec(@cmd)
 delete from temptables where name = @table
end

DROP TABLE temptables

Tuesday 23 December 2014

How to create, find and eliminate deadlocks

Deadlocks are typically encountered in high-load scenarios or in complex concurrent transactions. And the worst part is, you probably won’t see them until you’re live in production. Luckily, finding and solving deadlock issues isn’t as complicated as it may sound. All it takes is a bit of logging and patience, along with some basic know-how of how locking mechanisms work in SQL Server.

What’s a deadlock?

A deadlock occurs when two processes are waiting for the same resources – and both of these processes are waiting for the other to finish.
Imagine the following example of a banking database: Alice and Bob have bank accounts, and they want to send each other money. The process of sending money from one account to another involves two bookings that have to be atomic, i.e. run in a transaction. Because if the server or network connection fail after only half the operation has been completed, the balances would be off, and somebody would probably end up having a pretty bad day.
Alice starts off by moving $1000 to Bob’s account: (process 1)

BEGIN TRANSACTION;

   UPDATE banking.accounts
   SET balance=balance-1000
   WHERE account='Alice';

   UPDATE banking.accounts
   SET balance=balance+1000
   WHERE account='Bob';

COMMIT TRANSACTION;

But what if Bob, only a millisecond after Alice, clicks the “Send” button on his Internet banking web page, to send $500 to Alice? The code would look like this: (process 2)

BEGIN TRANSACTION;

   UPDATE banking.accounts
   SET balance=balance-500
   WHERE account='Bob';

   UPDATE banking.accounts
   SET balance=balance+500
   WHERE account='Alice';

COMMIT TRANSACTION;

Here’s a timeline of what happens:

Alice takes $1000 from her account (process 1). This creates a lock on her account (so nobody else can change the balance, in case we need to roll back the transaction later).
Bob takes $500 from his account (process 2), placing a lock on his account.
Alice (process 1) is now waiting for the Bob (process 2) to commit his transaction, so the lock on Bob’s account is released, before she can complete the transfer.
At the same time, Bob (process 2) is waiting for Alice (process 1) to commit her transaction.

In the end, both Alice and Bob are waiting for each other to complete their respective transactions, and they could technically wait forever. This is where the deadlock monitor, a system process in SQL Server, steps in an kills (rolls back) the process that has done the least amount of work. The killed process, known as the deadlock victim, receives the following error message:

Msg 1205, Level 13, State 51, Line 1
Transaction (Process ID 73) was deadlocked on lock resources with
       another process and has been chosen as the deadlock victim.
       Rerun the transaction.

In the example above, the objects of the deadlock are two rows in the same table, but the object of a deadlock can be a number of different things – tables, pages, rows, schemas, locks, even chunks of memory or worker threads. However, wer’re going to limit our scope to the most common kind, database objects.

Using SQL Server Profiler to identify deadlocks

A trace in SQL Server Profiler can give you details as to what objects are deadlocked. Start off from a blank template, then, under “Locks”, check the “Deadlock graph”, “Lock: Deadlock” and “Lock: Deadlock Chain” events.
You can run this trace interactively (with the Profiler window open) or in the background, as a server-side trace if you want to keep logging for a longer time. Just remember to stop your server-side trace once you’re done. A lot of people will tell you that a long-running trace takes a lot more resources if it runs interactively in Profiler than as a background trace, but I haven’t verified this myself.
Once you find deadlock events, you can view the dependency graph in Profiler to find out which objects were involved. From that, you can hopefully deduce what procedure or queries to tune. If you can’t, simply add those events as well, but be aware that your trace log can fill up pretty fast if you do.

How to avoid deadlocks

As you’ll understand by now, deadlocks are naturally occurring in databases, and the more transactions you run through it, the more likely a deadlock is to happen. So mostly, there is no sure-fire way of completely eliminating deadlocks. The best you can do is a kind of “harm reduction”, where you a) minimize the number of deadlocks, and b) handle those that happen, by trapping deadlocks and rerunning the transactions.

Guidelines for minimizing the number of deadlocks

In my personal order of preference:

Deadlocks (nearly) always happen in SQL transactions. Time is a factor, so keep transactions as short as possible. Optimize indexes for performance on critical tables. And never, ever, leave an uncommitted transaction waiting for user input.
For all operations that are atomic, keep the size of the operation to an absolute minimum, as far as your situation allows for.
Use locking hints to place more specific locks. Lock all your resources as quickly as possible.
Make sure all your code places locks in the same order.
Try different isolation levels.

How to handle a deadlock

Deadlocks can be trapped using a TRY-CATCH block. If your transaction is deadlocked, it is rolled back to the point of the BEGIN TRANSACTION statement.

Locking order

If you always lock objects in the same order, you will significantly reduce the risk of those objects being part of a deadlock. To look at the banking example above, if we always make the transfers in the alphabetical order of the account holder’s name, here’s how Bob’s transfer (process 2) would have looked:

BEGIN TRANSACTION;

   -- 'Alice'<'Bob', so her account goes first:
   UPDATE banking.accounts
   SET balance=balance+500
   WHERE account='Alice';

   UPDATE banking.accounts
   SET balance=balance-500
   WHERE account='Bob';

COMMIT TRANSACTION;

With this new setup, let’s go through the timeline again:

Alice takes $1000 from her account (process 1). This creates a lock on her account.
Bob (process2) waits for Alice.
Alice adds $1000 to Bob’s account (process 1).
Alice (process 1) commits her transaction.
Bob (process 2) now adds $500 to Alice’s account.
Bob (process 2) removes $500 from his account.
Bob commits his transaction.

Locking hints

Using locking hints, it’s possible to tell SQL Server what kind of locks to place. This is practical when you’re handling lock escalation. Lock escalation occurs when a lock is converted, for instance from a shared lock to an update or exclusive lock. Lock escalations happen all the time, and mostly, SQL Server gets it right. However, it can only guess what you want to accomplish, so you can help it along by explicitly specifying what types of locks and what granularity you want.
Depending on your query, an aggressive lock (exclusive table or schema lock) may work better or worse than a relaxed strategy (intent shared, row lock). The best strategy is very dependent on how your transaction looks and how much work it does.
Locking hints are defined with the tables in a query:

...
FROM dbo.table_a AS a WITH (TABLOCKX, HOLDLOCK)
LEFT JOIN dbo.table_b AS b WITH (NOLOCK) ON ...

Here are some common locking hints

TABLOCK and TABLOCKX: Table lock and exclusive table lock. One lock is placed on the entire table.
PAGLOCK: Places a page lock. A page is an 8 kB unit of storage.
ROWLOCK: Locks individual rows. Very fine-grained, should only be used on very small quantities of data.
UPDLOCK: Update lock, allows other processes to view locked data, but not change it.
NOLOCK: Applies to SELECT statements only. Does not place any lock and does not honor any existing locks from other processes.

A locking hint-based solution to the deadlock example above could be placing initial locks on both rows before performing the two updates:

BEGIN TRANSACTION;

   --- Place a rowlock on each of the two rows and hold these
   --- locks for the remainder of the transaction.
   SET @dummy=SUM(balance)
   FROM banking.accounts WITH (ROWLOCK, HOLDLOCK)
   WHERE account IN ('Alice', 'Bob')
   UPDATE banking.accounts
   SET balance=balance-1000
   WHERE account='Alice';

   UPDATE banking.accounts
   SET balance=balance+1000
   WHERE account='Bob';

COMMIT TRANSACTION;

This way, both accounts are locked the instant we start the procedure. The HOLDLOCK hint makes sure that the two locks are held until we commit or roll back the transaction.

Deadlock priority

Normally, when a deadlock occurs, the transaction that has accomplished the least amount of work is the one that is chosen as the deadlock victim, i.e. rolled back with an error message.
However, you can assign different priorities to different transactions, depending on what they do or how critical they are.

SET DEADLOCK_PRIORITY HIGH

Valid parameters are LOW, NORMAL, HIGH, or numeric values between -10 and 10. A lower number or LOW means that there is a higher probability that the transaction will be selected as the victim in a deadlock.

Rebuilding a transaction log file

The database’s transaction log file contains, like the name implies, a log of all the transactions happening in the database. If you’ve set your database to the “simple” recovery model, the log is truncated (emptied) at every checkpoint (more or less all the time). In “bulk logged” or “full” recovery model, you’ll have to truncate the log files yourself, by performing a transaction log backup.
So because of this, the log file has to stay in sync with the rest of the database at all times. But what do you do if your log file is lost or corrupted? Luckily, there’s a simple way to recover it.

I’ve discovered to ways to remedy this problem, for two different scenarios.

The simple option: Attaching a clean database

If your database was shut down cleanly and you have all the database files (except the log file, of course), you can just attach the database using the CREATE DATABASE statement with FOR ATTACH_REBUILD_LOG.
It’ll look something like this:

CREATE DATABASE DatabaseNameHere
ON (FILENAME='D:\File1.mdf'),
   (FILENAME='D:\File2.ndf'),
   (FILENAME='D:\File3.ndf'),
   (FILENAME='D:\File4.ndf')
FOR ATTACH_REBUILD_LOG;

This basically attaches the database from the MDF/NDF files, and creates a blank, 1 MB large log file from scratch in the default log directory of your server. This is by far the easiest and least risky way to recreate a new log file.
This will of course break the backup chain, but then again, if you’ve lost your log file there’s really not much else you could do anyway.

The tricky option: database is suspect

In some situations, a corrupt log file will render the database “suspect”. First off, change the database mode to “emergency” and to single user mode using the following script. I’ve added “read-write” as well, in case the database is set as read-only:

ALTER DATABASE databaseName SET EMERGENCY;
ALTER DATABASE databaseName SET SINGLE_USER;
ALTER DATABASE databaseName SET READ_WRITE;

If the database is now in single-user, emergency mode, you can run a DBCC command on it. The following DBCC operation does not only create a new log file, but also checks the database for consistency errors, so it may take a while to run, depending on the size of your database.

DBCC CHECKDB (databaseName, REPAIR_ALLOW_DATA_LOSS);

With a little bit of luck, the operation will succeed, and your heart rate and blood pressure will go down a notch. Now, all you have to do is set the database back online and for multi-user use.

ALTER DATABASE databaseName SET MULTI_USER;
ALTER DATABASE databaseName SET ONLINE;

I don’t really have loads of experience of crashing log files, though a… uhm… friend of mine managed to lose a log file once. I’ve tried the solution above, and it worked for me once. Your mileage may vary, though.
Best of luck, anyway!

Introduction to partitioning

Partitioning is not only a great way to improve performance in large tables, but you can also use it to manage the physical storage of different chunks of data without keeping it in different tables.

First off, to use partitioning on SQL Server, you need the Enterprise Edition, which comes with a hefty price tag and as such is really only suited for more demanding setups.

How partitioning works

Partitioning data means that you split the physical storage of it into different filegroups (which in turn typically means different physical storage locations). This creates a number of performance benefits, because although the table (for instance) is still visible as one table with data, in actual fact, it is stored as a number of tables physically. And smaller tables means better performance.
You can partition a table, a heap or an index. When you partition a table (with a clustered index) it’s actually the clustered index, which makes up the physical storage of the table, that is being partitioned.
Some of the benefits of partitioning your data are obvious, other not:

Using partition elimination, SQL Server can query a single partition to increase performance.
You can apply different data compression to different partitions, depending on your needs.
Data that is frequently updated can reside on storage with good write performance, while older data that typically doesn’t change can be stored on disks with better read performance.
Self-joins (actually, “equi-joins”) can be performed more efficiently because two partitions can be treated as separate tables.
Partitioned data allows for partition switching – an advanced and very, very efficient way to move entire partitions of data between two identical tables/partitions.

Aligned indexes

When you have a partitioned table, you can partition the table’s non-clustered indexes as well. This does not happen automatically, and you can choose different partition schemes for the table and its indexes independently of each other.
However, if you choose a partition scheme for a non-clustered indexes that is essentially equivalent to that of the table, that index is referred to as being “aligned” to its table, which brings a few performance advantages which I won’t go into here, mostly related to switching or moving data in and out of partitions.

Creating a partition function

The first thing you need is a partition function. This is basically the definition of which rows end up in which partition.

--- Create the partition function:

CREATE PARTITION FUNCTION fn_part_left(int) AS RANGE LEFT FOR VALUES (100, 110, 120, 130);
The “boundary values” for this partition function are 100, 110, 120 and 130. This means that these values are the lower and upper boundaries of each partition.
A partition function can either be RANGE LEFT or RANGE RIGHT. Whether a partition function is “left” or “right” defines on which partition to store values that are exactly equal to a boundary value:

value        RANGE LEFT partition   RANGE RIGHT partition
98             1                      1
99             1                      1
100            1                      2
101            2                      2
102            2                      2
...
128            4                      4
129            4                      4
130            4                      5
131            5                      5
132            5                      5

The boundary values of the example partitions have been highlighted in red above.
For a RANGE LEFT partition function, the first partition will include values up to and including the first boundary value and the last partition will include values above the last boundary value.
For a RANGE RIGHT partition function, on the other hand, the first partition will include values below the first boundary value and the last partition will include values starting at the last boundary value.

To test which partition a given value would end up in, you can use the special $PARTITION function, like this:

SELECT $PARTITION.fn_part_left(123);

.. where fn_part_left, in this case, is the partition function, and 123 is the value we want to test. You can also prefix $PARTITION with a specific database name.

SELECT databaseName.$PARTITION.partFunction(123);

Creating a partition scheme

Second, you need a partition scheme which defines which filegroups are used for each partition in the partition function.

--- Create the partition scheme:
CREATE PARTITION SCHEME ps_part_left AS
PARTITION fn_part_left TO
    ([GROUP_A], [GROUP_B], [GROUP_C], [GROUP_A], [GROUP_B]);

Remember, you need one more partition than the number of partition boundaries. So if you have four boundaries, that means you need five partitions. If you want to, you can define all the partitions on the same filegroup using the ALL keyword.

--- Create the partition scheme,
--- all partitions on the same filegroup:
CREATE PARTITION SCHEME ps_part_left AS
PARTITION fn_part_left ALL TO ([PRIMARY]);

Creating a partitioned table or index

When creating tables and indexes, you declare the partition scheme and partitioning column where you would otherwise declare a filegroup, using the ON keyword:

--- Create a table on the partition scheme:
CREATE TABLE dbo.tbl_part_left (
     i      int NOT NULL,
     PRIMARY KEY CLUSTERED (i)
) ON ps_part_left(i);

Partitioning an existing table or index

An existing index or an existing table with a clustered index can be moved to a partition scheme using CREATE INDEX with DROP_EXISTING.

--- Create a standard, unpartitioned table:
CREATE TABLE dbo.tbl_unpart (
    i          int NOT NULL,
    CONSTRAINT PK_tbl_unpart PRIMARY KEY CLUSTERED (i)
) ON [PRIMARY];

--- Populate the table with some example data:
INSERT INTO dbo.tbl_unpart (i)
VALUES (98), (99), (100), (101), (102),
       (103), (104), (105), (106), (107),
       (108), (109), (110), (111), (112),
       (130), (131), (132), (133);

--- Move the existing table to a partition scheme using
--- CREATE INDEX with DROP_EXISTING:
CREATE UNIQUE CLUSTERED INDEX PK_tbl_unpart ON dbo.tbl_unpart (i)
        WITH (DROP_EXISTING=ON)
    ON ps_part_left(i);

This was a very short introduction to partitioning, but as you understand, it’s a fairly deep topic, and there’s lots of room for discussions like partition switching, merging, compression, etc. Stay tuned for more further on!

Shrinking tempdb without restarting SQL Server

Ok, so even if you’re a seasoned veteran T-SQL coder, at some time you will write a query that runs away and supersizes the tempdb database. This, in turn, might fill up your disk and cause other server-related problems for you. At that point, you may find out the hard way that shrinking tempdb isn’t like shrinking any other database.

Here are some tricks that I’ve tried successfully – but bear in mind that your mileage may vary.

Tempdb stores temporary tables as well as a lot of temporary (cached) information used to speed up queries and stored procedures. For the best chances in shrinking tempdb, we’re going to clear these different caches (except for the temp tables, which you should drop manually).

First off, the easy way out

It’s worth mentioning. If you’re not running a production-like environment, your best bet is to restart the SQL Server service. This will return tempdb to its default size, and you won’t have to worry about all the potential pitfalls of this article. But since you’re reading this, chances are you can’t just restart the server. So here goes:
Warning: These operations remove all kinds of caches, which will impact server performance to some degree until they’ve been rebuilt by the SQL Server. Don’t do this stuff unless absolutely neccessary.

DBCC DROPCLEANBUFFERS

Clears the clean buffers. This will flush cached indexes and data pages. You may want to run a CHECKPOINT command first, in order to flush everything to disk.

CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO

DBCC FREEPROCCACHE

Clears the procedure cache, which may free up some space in tempdb, although at the expense of your cached execution plans, which will need to be rebuilt the next time. This means that ad-hoc queries and stored procedures will have to recompile the next time you run them. Although this happens automatically, you may notice a significant performance decrease the first few times you run your procedures.

DBCC FREEPROCCACHE;
GO

DBCC FREESYSTEMCACHE

This operation is similar to FREEPROCCACHE, except it affects other types of caches.

DBCC FREESYSTEMCACHE ('ALL');
GO

DBCC FREESESSIONCACHE

Flushes the distributed query connection cache. This has to do with distributed queries (queries between servers), but I’m really not sure how much space they actually take up in tempdb.

DBCC FREESESSIONCACHE;
GO

.. and finally, DBCC SHRINKFILE

DBCC SHRINKFILE is the same tool used to shrink any database file, in tempdb or other databases. This is the step that actually frees the unallocated space from the database file.

Warning: Make sure you don’t have any open transactions when running DBCC SHRINKFILE. Open transactions may cause the DBCC operation to fail, and possibly corrupt your tempdb!

DBCC SHRINKFILE (TEMPDEV, 20480);   --- New file size in MB
GO

Don’t set the new size too low! Make a realistic estimate of the largest “normal” size that tempdb will assume during normal day-to-day operation.
That’s it. If everything works the way it should, you should now be able to verify the new size of tempdb.

A word about shrinking database files

Best practice is to try to minimize the use of file or database shrinking as much as possible. Whenever you shrink a database file and it re-grows later on, you are potentially creating fragmentation on your physical storage medium. This is because the sectors that the file used to occupy may now very well be occupied by other information (just a few bytes are enough). When SQL Server wants to grow that database file, the newly added portion of the file will need to be placed elsewhere on the disk, thus creating fragmentation.

The number one mortal sin in this context is “autoshrink“, because it may very well add to the drive fragmentation every time it runs, which could be very frequently.
As a rule of thumb, never ever autoshrink a database. And try to be very restrictive when it comes to shrinking databases or files in general, unless it’s a one-off operation to fix the aftermath of a runaway query.

SQL joins illustrated

Enjoy!

Reading a query plan

Knowing how to read a query plan is absolutely key to optimizing SQL Server query performance. The query plan tells you how SQL Server goes about running your query, including what indexes are used (and how), what join strategies are applied and a lot of other information. If you can read the query plan, you can make the appropriate changes to indexes, query hints, join conditions, etc to tune your workload for optimum performance.

How to view a query plan

You can show an estimated query plan by selecting the query you want to evaluate and press Ctrl+L. This will set the SHOWPLAN flag in SQL Server, so the execution of the query is limited to just displaying the query plan, not actually executing the query.
The other way is to enable “Include actual query plan” (press Ctrl+M to toggle). The difference between these two is that the actual query plan is displayed as the query runs. The estimated query plan is, like the name implies, just an estimate. In my opinion, the two rarely differ significantly.
Here’s an example of how your query plan might look.

The flow of it

Query plans are read from right-to-left; that is, each table or index that is collected along the way is a “leaf” node, all the way out to the right, and the final result of the operation is the top-left node. In the example plan above, you can see that we’re joining 7 different tables into one big SELECT query.
It’s important to understand that while T-SQL is merely a language that describes what you want to do with your query, the actual query plan may actually differ significantly from how you write the query. The reason for this is that the SQL Server optimizer normally does a very good job in finding the most efficient method to solve the task at hand, but that method may not follow for instance the join order or methods that you imagined. It all comes down to how the data is physically arranged in the database.

Indexes and statistics

The primary help for the query optimizer in this task are the different indexes and statistics that exist on the different objects that are referenced in the query. For this reason, making sure that tables are well-indexed and that their statistics are up-to-date can dramatically improve your query performance.

Common query plan operators

Here’s a fairly common selection of operators you will probably encounter in query plans:

Clustered index seek, index seek
An index seek is where the server looks for a single value or a closed interval of values. Unless you want to return the entire index/table, a seek is generally preferrable to a scan (see below).

Clustered index scan, index scan
An index scan occurs when it’s not possible to “seek” a value in the index, for instance because you’re using a wildcard condition. It is also applied when you deliberately want to make use of the entire index or (clustered) table.

Table scanIf this operator turns up, you’ve forgotten to create a clustered index to your table. It goes through the entire table, top-to-bottom, which is resource-consuming and results in very poor performance and takes a heavy toll on the I/O. A table scan only happens when there is no suitable index available to complete the query.

Nested loops
Nested loops is a way to JOIN two tables. This method is mostly practical when one table is considerably larger than the other. For each row in the “outer” table, it loops through the “inner” table.

Merge join
A merge join is the single most optimal join method from a performance standpoint, but in order to achieve it, the key columns of the two data sets to be joined need to match perfectly with regard to data types and sort orders.

Hash match, hash join
This is the JOIN method applied if a nested loop or merge join is not feasible. It uses internal hash tables to join two data sets.

Filter
The filter operator most closely resembles the WHERE or HAVING clauses, where these operations cannot be performed directly on the index or table.

Constant scan
Represents the use of constants, without the need to retrieve any value from an actual index or table, for instance

SELECT 123 AS my_integer;

Compute scalar
Any scalar calculation performed, for example arithmetic operations, string functions, etc.
Sort

Sort
Sorting the dataset can be performed at the end of the query in order to comply with an ORDER BY directive, or for instance before a JOIN operator in order to prepare the data set for a more optimized join method.

Tooltip information

If you hover over a node in the query plan, you’ll see a lot of detailed information about each node. The most important ones to look for are the estimated I/O and CPU costs (how much disk input/output workload and computing power respectively that the step will consume), as well as the subtree cost.
Also, take a look at the output list and/or predicates, at the bottom of the tooltip. They provide more detail about columns are used in a join operator, what columns is output from an index scan, etc.

More details

Again, this is but a selection of common query plan operators that you’re likely to encounter. There are a lot more operators, with more detailed descriptions, available on MSDN. A deeper discussion on the performance aspects and requirements of different JOIN and indexing stragies will be the topic of a separate post. Stay tuned.

Monday 15 December 2014

How to install SQL Server on an active/passive cluster?

SQL Server clustering is a collection of two or more physical servers known as nodes with access to shared resources forming a public network that serves as the interface for applications. SQL Server service, SQL Server agent service, and disks are example of shared resources. Each of the nodes communicates constantly with each other through aninternal private network inside the cluster. This is also known as the heartbeat. The heartbeat checks whether a node is up and running. Only one node in an active/passive cluster can access to the shared resources at a time. When active node losses heartbeat or communication to the other node in the cluster, the passive node will take over the ownership of shared resources to service all requests coming in to the cluster. This is known as failover.

SQL Server clustering is created on top of Microsoft Cluster Services (MSCS). MSCS is capable of detecting hardware and software failures and automatically change the control of the resources to a functioning node when needed. To coordinate the distributed transactions between 2 or more nodes, the Microsoft Distributed Transaction Control(MSDTC) service is needed. Within a cluster, the Quorum Drive contains the update-to-date information about the state of the cluster. Without it, the cluster can’t function at all.

Prior to installing SQL Server on a cluster, you should ask or work with the System Administrators to install and configure the Microsoft Cluster Services (MSCS). In addition, you should have your checklist ready for verification and preparation for the installation. It is recommended that you write down the answer next to each of the check items. The following is an example of such list:

Know the nodes for the cluster and their ip addresses and access to both nodes.
The SQL Server installation product key.
The SQL Server components to be installed.
The SQL Server virtual server name.
Default or named instance?
The shared disks to select.
The SQL Server virtual IP address.
The subnet mask. Usually it is 255.255.255.0
The dedicated domain account for the SQL Server service.
The dedicated domain account for SQL Server Agent service.
The dedicated domain accounts for each of the services to be installed (SSAS, SSIS, and/or SSRS).
The SQL Server Collation to select.
Windows or Mixed authentication mode?
The list of users to be the SQL Server administrator for the instance.
The default locations for the data and log files for the user databases and tempdband also the location for the backup directory.
Do you need to enable FILESTREAM?

Before running the installation wizard, you should run the validation wizard for your cluster configuration to make sure all components are compliant:

Click the “Start” button then click on “Administrative Tools” and then click on “Failover Cluster Management.”
In “Failover Cluster Manager,” click on the “Validate a Configuration” link.
In “Select Server or a Cluster,” browse to or enter the cluster name or the nodes.
In “Testing Options,” select the option “Run all tests (recommended)” and then click “Next.”
In “Review Storage Status,” select the option “Take the listed services or applications offline, so that the associated disks can be tested. These services or applications might be unavailable for a significant time. They will be brought online again after the tests.”
Click “Next.”
Review the “Failover Cluster Validation” report.

After you had the checklist items filled in and ran the cluster configurations validation check, you are ready for the installation of SQL Server on the cluster. Here we go:

Insert the CD or on the installation folder, run setup.
On the “SQL Server Installation Center” page, click “Installation” link on the left hand column, then click on the link “New SQL Server failover cluster installation” on the right.
On the “Setup Support Files” page, click the “Install” button.
The System Configuration Checker runs a discovery operation on your computer. Review the details by clicking on the “Show details >>” button if necessary. Click the “Ok” button to continue.
On the “Product Key” page, key in the product key and then click the “Next” button.
On the “License Terms” page, check the “I accept the license terms” checkbox and then click the “Next” button to continue.
On the “Setup Support Rules” page, review and correct the errors and warning if there are any before proceeding. If all green, meaning successfully, then click the “Next” button to continue.
On the “Feature Selection” page, select the components that you need not what you wish to have for your installation.
On the “Instance Configuration” page, key in the SQL Server virtual name in the “SQL Server Network Name” textbox and also determine if you wish to have a default instance or a named instance. Make the selection for the corresponding radio button. If you selected the “Named instance” option, you must key in the name for the named instance. By default, the instance named is used as the Instance ID. You can leave as the default. Click the “Next” button to continue.
On the “Disk Space Requirements” page, the installer checks if you have enough disk space on your local disks to install the SQL Server 2008 binaries. You should see the green check mark(s) indicating that you have sufficient amount of disk space for the installation. Click the “Next” button to continue.
On the “Cluster Resource Group” page, select “SQL Server (MSSQLSERVER)” in the drop-down box for “SQL Server cluster resource group name:” or you can type in a new one as the cluster resource group. Click the “Next” button to continue.
On the “Cluster Disk Selection” page, select the shared disks for the SQL Server resource cluster group. Click the “Next” button to continue.
On the “Cluster Network Configuration” page, specify the “IP Address” and “Subnet Mask” for the SQL Server cluster. Deselect the checkbox for “DHCP” as you will be using static IP addresses. Uncheck “IPv6” if it is checked. Click the “Next” button to continue.
On the “Cluster Security Policy” page, select the option “Use service SIDs (recommended).” This is the recommended option for Windows Server 2008 and later.
On the “Server Configuration” page and in the “Service Accounts” tab, specify the “Account Name” and “Password” for the SQL Server Agent service, the SQL Server Database Engine service, and other services that you had selected in the components section page. Microsoft recommends individual accounts for each of the services in addition to following the principle of Least Amount of Privileges.
Still on the “Server Configuration” page, click the “Collation” tab and then specify the collation as per the application(s) requirements. Don’t select the default unless you are absolutely sure or else it will be a royal pain in the behind to correct this. Stop and confirm the correct collation before moving on. Seriously! Once you have selected the collation, click the “Next” button to continue.
On the “Database Engine Configuration” page and in the “Account Provisioning” tab, select the option “Windows authentication mode” or the option “Mixed Mode (SQL Server authentication and Windows authentication).” If you had selected the latter, then you must key in the “sa” password. In the “Specify SQL Server administrators” section, click the “Add Current User” button to add the user running the installation as an administrator. You can also add more users by clicking on the “Add…” button.
Still on the “Database Engine Configuration” page, click the “Data Directories” tab to specify and default directories for the data, user databases data and log, tempdb data and log, and backup.
Still on the “Database Engine Configuration” page, click the “FILESTREAM” tab to enable Filestream on this instance if needed. Click the “Next” button to continue.
On the “Error and Usage Reporting” page, check the corresponding optional checkboxes if you wish then click the “Next” button to continue.
On the “Cluster Installation Rules” page, the installation validation system validates the configuration and selection of the specified SQL Server components. If everything passed, clicked the “Next” button to continue.
On the “Ready to Install” page, you can see the components that were selected. Click the “Install” button to start the installation.
On the “Installation Progress” page, you can see the installation progress status as the bar move to the right end.
Cross your fingers and wait for the “Complete” page and hope to see the green check mark “Your SQL Server 2008 failover cluster installation is complete.” Click the “Close” button to end.

You will now need to add the second node in the failover cluster to enable high-availability.

On the passive node, start the installation.
On the “SQL Server Installation Center” page, click the “Installation” link on the left hand column and click on the “Add node to a SQL Server failover cluster” link on the right.
On the “Setup Support Rules” page, validate the checks return all successful results, then click the “Ok” button to continue.
On the “Product Key” page, key in the product key and then click the “Next” button.
On the “License Terms” page, check the “I accept the license terms” checkbox and then click the “Next” button to continue.
On the “Cluster Node Configuration” page, select the virtual SQL Server instance name on the “SQL Server instance name” drop-down box. Click the “Next” button to continue.
On the “Service Accounts” page, specify the “Account Name” and “Password” for the SQL Server Agent service, the SQL Server Database Engine service, and other services that you had selected in the components section page. Microsoft recommends individual accounts for each of the services in addition to following the principle of Least Amount of Privileges. Once done, click the “Next” button to continue.
On the “Error and Usage Reporting” page, review and correct the errors and/or warnings if needed and then click the “Next” button to continue.
On the “Ready to Add Node” page, review the configurations and selections you have made and then click “Next” to continue.
On the “Add Node Progress” page, the status progress bar will move to the right end as per the installation progress.
On the “Complete” page, nothing is sweeter than seeing the green check mark next to “Your SQL Server 2008 failover cluster add node operation is complete.” Click the “Close” button to end.

This is what the SQL Server cluster looks like in the Cluster Admin: Screen-shot.

After verifying that the latest services pack is compatible with the application(s) you can then apply the latest service pack or hotfixes on the Active/Passive SQL Server cluster with the following steps:

On the passive node (Node2), apply the service pack or hotfixes.
Reboot the passive node (Node2).
On the active node (Node1), failover the SQL Server resource. The passive node (Node2) that you had already patched will become the active node. To failover the SQL Server resource, open the “Failover Cluster Management,” under “Services and Applications,” right-click on “SQL Server (MSSQLSERVER),” then click on “Move this service or application to another node >” and click on the option to move to the passive node.
On the passive node (Node1), apply the service pack or hotfixes.
Reboot the passive node (Node1).

You can verify the current service pack and version build number by running the following query:

view source

1-- Querying the SQL Server Instance level info

2SELECT

3    SERVERPROPERTY('ServerName') AS [SQLServer]

4    ,SERVERPROPERTY('ProductVersion') AS [VersionBuild]

5    ,SERVERPROPERTY ('Edition') AS [Edition]

6    ,SERVERPROPERTY('ProductLevel') AS [ProductLevel]

7    ,SERVERPROPERTY('IsIntegratedSecurityOnly') AS[IsWindowsAuthOnly]

8    ,SERVERPROPERTY('IsClustered') AS [IsClustered]

9    ,SERVERPROPERTY('Collation') AS [Collation]

10    ,SERVERPROPERTY('ComputerNamePhysicalNetBIOS') AS[CurrentNodeName]

CLOCK