Daily DBA Morning Check List
Database Administrators can
sometimes have one of the most stressful jobs in the company. If you
have been a DBA for long, you know the scenario. You have just sat in
your chair with your cup of coffee, and your phone starts ringing off
the hook. The voice on the other end states that they can’t pull up
their data or they are getting timeouts, or the system is running
slow. Okay, time to dig in; it’s going to be one of those days! Is
it Friday yet?
In
this article, I will present ways to minimize those stressful days by
having a pre-defined DBA morning checklist. A morning DBA checklist is a
document of pre-defined administrative checks that are performed every
morning to ensure that your server is at optimal performance. By having
a standard list of items to check, you are more likely to catch and fix
issues before there is a real problem.
The
end result of the morning DBA checklist should have three sections.
Section one contains the list of items that need checked. Section one
should include checks from the following categories: performance, job
failures, disk space, backups, connectivity, and anything specific to
your environment, such as replication, mirroring, clustering, etc.
Section two contains a place to write down issues and how they were
resolved. The third section is a confirmation section where it is
signed and dated. The third section is very important. Without this
section, it is difficult to enforce and guarantee that these checks were
performed.
The first step to create an effective morning checklist is to meet with all the DBAs and ask them these questions:
- What do you check in the morning?
- How do you check it?
- What do you do when there is a problem?
- Is there anyone you notify in the event of a failure?
In my
experience, every DBA has his own mental checklist and different ways
that he / she fix issues. It is important to get a list of the items
written down in a document. By combining the ideas of every DBA, you
will come up with a more thorough checklist, a standardized way to fix
issues, and problems are less likely to fall through the cracks.
After
the DBA morning checklist is created, completed checklists should be
archived in a notebook to ensure that each check was performed every
day. This also serves as a history of fixes for past issues, and an
audit trail for the DBA.
Since
every database environment is different, and every IS shop has its own
tools, every DBA’s checklist will be different. The end goal is to
create a checklist that is customized to your environment, in which
issues can be found and fixed quickly, so that you can avoid having one
of those difficult days.
With
this in mind, listed below is a sample checklist. Your checklist
should be unique to your environment and should help find and fix issues
as quickly as possible.
Section 1: DBA Morning ChecklistBackups
¨ Verify
that the Network Backups are good by checking the backup emails. If a
backup did not complete, contact _____ in the networking group, and send
an email to the DBA group.
¨ Check
the SQL Server backups. If a backup failed, research the cause of the
failure and ensure that it is scheduled to run tonight.
¨ Check
the database backup run duration of all production servers. Verify
that the average time is within the normal range. Any significant
increases in backup duration times need to be emailed to the networking
group, requesting an explanation. The reason for this is that
networking starts placing databases backups to tape at certain times,
and if they put it to tape before the DBAs are done backing up, the tape
copy will be bad.
¨ Verify
that all databases were backed up. If any new databases were not
backed up, create a backup maintenance plan for them and check the
current schedule to determine a backup time.
Disk Space
¨ Verify
the free space on each drive of the servers. If there is significant
variance in free space from the day before, research the cause of the
free space fluctuation and resolve if necessary. Often times, log files
will grow because of monthly jobs.
Job Failures
¨ Check
for failed jobs, by connecting to each sql server, selecting “job
activity” and filtering on failed jobs. If a job failed, resolve the
issue by contacting the owner of the job if necessary.
System Checks
¨ Check
SQL logs on each server. In the event of a critical error, notify the
DBA group and come to an agreement on how to resolve the problem.
¨ Check
Application log on each server. In the event of a critical or unusual
error, notify the DBA group and the networking group to determine what
needs to be done to fix the error.
Performance
¨ Check Performance statistics for All Servers using the monitoring tool and research and resolve any issues.
¨ Check Performance Monitor on ALL production servers and verify that all counters are within the normal range.
Connectivity
¨ Log
into the Customer application and verify that it can connect to the
database and pull up data. Verify that it is performing at an acceptable
speed. In the event of a failure, email the Customer Support Group,
DBA group, and the DBA manager, before proceeding to resolve the issue.
¨ Log
into the Billing application and verify that it can connect to the
database and pull up data. Verify that it is performing at an acceptable
speed. In the event of a failure, email the Billing Support Group, DBA
group, and the DBA manager, before proceeding to resolve the issue.
Replication
¨ Check replication on each server by checking each publication to make sure the distributor is running for each subscription.
¨ When
replication is stopped, or changes to replication are made, send an
email to the DBA group. For example, if the DBA stops the distributor,
let the other DBAs know when it is stopped and then when it is restarted
again.
¨ Check
for any emails for the SQL Jobs that monitor row counts on major tables
on the publisher and subscriber. If a wide variance occurs, send an
email message to the DBAs and any appropriate IS personnel.
Section 2: Write down any issues and how they were resolved
This space is reserved for writing down issues and how they were fixed.
Section 3 – Confirmation
Completed By __________________________ Date: ___________________
Conclusion:
Creating
a morning DBA checklist has helped me many times in the past. Often
times, I found CPU usage up near 100%, broken replication, connectivity
problems, and space issues that I have been able to resolve before the
majority of the work force was present and the issue could escalate. By
having a standard DBA checklist document, it ensures that nothing is
forgotten, which could result in a problem. It also minimizes down time
of a company or department, provides a archive of past issues and how
they were fixed, and helps ensure that the DBA will have a less
stressful day!
No comments:
Post a Comment