
| Systems Administration Toolkit: Log file basics | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 摘自: IBM developerWorks Worldwide 被阅读次数: 83 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
由 yangyi 于 2008-04-25 19:07:37 提供 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Level: Intermediate Martin Brown (mc@mcslp.com), Professional Writer, Freelance 26 Feb 2008 A typical UNIX® or Linux® machine creates many log files during the course of its operation. Some of these contain useful information; others can be used to help you with capacity and resource planning. This article looks at the fundamental information recorded within the different log files, their location, and how that information can be used to your benefit to work out what is going on within your system. The typical UNIX administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.
All systems generate a varying quantity of log files that track and record different information about your machine. The content and utility of these files varies from system to system, but the core information provided by the files is often consistent. For example, all UNIX and Linux machines use the syslog, a generic logging system that is used by the operating system and applications and services to log information. The syslog records a whole host of data, including logins, performance information, and failures reported by different hardware and systems. In addition to the syslog, systems also have a variety of service, environment, and application logs that record information about the machine and its operation. Although parsing and extracting the content of the log files for information can be time consuming and sometimes complex, the wealth of information in those logs is difficult to ignore. The log file can provide hints on potential problems, faults, security lapses and, if used correctly, can even help provide warnings on load and capacity of your servers.
The location of the various log files varies from system to system. On most UNIX and Linux systems the majority of the logs are located in /var/log. For example, Listing 1 shows a list of logs located on a Gentoo Linux system. Listing 1. Linux /var/log directory contents
On Solaris®, IBM® AIX®, and HP-UX®, the main syslog and most of the other logs are written to files within the /var/adm directory (Listing 2). Listing 2. Traditional UNIX /var/adm contents
In addition, some non-system-level messages and information are written into logs located within /var/log (Listing 3). For example, on Solaris, by default, mail debug entries are written into /var/log/syslog. Listing 3. Additional logs in /var/log on Solaris
Of course finding the files is the least of the issues. You need to know what the files contain for the information to be of any use. Depending on the UNIX variants, some logs may be littered about in other places, but there has been a significant attempt to standardize on log file locations to one of the directories already mentioned.
Log types fall into two categories, text log files that contain messages and information in a simple text format, and files that are encoded in a binary format. The former is used for most of the logs in your typical system as they are easy to write and, perhaps more importantly, easy to read. The issue with text files is that they can sometimes be difficult to extract information from in a structured way, because the text format of the files allows the information to be written in any way or structure. The latter format is more practical for very structured information, or for information that needs to be written in a particular way or format. For the example, the utmp and wtmp data is written to a file in fixed blocks of binary data so that the information can read and be written in a quick and efficient format. Unfortunately, this means that the information is difficult to read without using special tools. The syslog service is a daemon that runs the background and accepts log entries and writes them to one or more individual files. All messages reported to syslog are tagged with the date, time, and hostname, and it's possible to have a single host that accepts all of the log messages from a number of hosts, writing out the information to a single file. Messages are also identified by the service that raise the issue (for example, mail, dhcp, kernel), and a class indicating the severity of the message. The severity can be marked as info (purely for information), warning, error, critical (a serious problem that needs addressing), and even emergency (the system needs urgent help). The service is highly configurable (generally through /etc/syslog.conf, or the equivalent), and allows you to select what classes of information to log, and where to log the information. For example, you can write all the standard information out to a file. But for critical messages, where administrators need the information right away, these messages can be sent immediately to the console. Listing 4 shows the main configuration content of the default syslog.conf file from a Solaris 10 installation. Listing 4. Sample syslog.conf file
Because syslog is a standard logging mechanism within UNIX/Linux it is used to record a massive array of different information. This includes boot messages, login and authorization information, and service startup/shutdown. In addition, syslog is often used to record e-mail messages delivery, filesystem issues, and even DHCP leases, DNS issues, and NFS problems. Because syslog can write the data to different areas, it's not always obvious that syslog is writing the information. The main destination for the on-disk copy of the syslog differs between UNIX variants. Many Linux solutions write the information to /var/log/messages. On AIX, Solaris, and HP-UX, the syslog is written to /var/adm/messages. You can see a sample of /var/adm/messages from a Solaris machine in Listing 5. Listing 5. Sample system log output
You can see in the sample output that there is a wide range of information here, from problems and issues with hardware devices through to issues with the current configuration of the mail service. The format of the file is quite straightforward: it contains the date, hostname, service name, a unique ID (to enable the system to log multi-line messages and have them identified), and the identifier and class of the entry. The remaining text on each line is just free-form text from the system logging the error message. The format of the file makes it easy to pull out the information you want. All the lines in the file are tagged with a unique ID and all lines are tagged with the identifier and class of the error message. For example, you can pull out information on critical issues with the mail system by using grep to pick out the entries tagged with mail.crit: To process the detail of the individual lines within the log is more complex. Although the first few columns within the file are standardized (they are written by the syslog daemon), the format of the remainder of the line is entirely dependent on the component reporting the error message. This can make it complex to read and parse the contents of the file, as you will need to treat each line according to the identifier and reporter. Even then, some lines will not follow a format. All UNIX and Linux systems have a log that is actually part of the kernel. In practice the log is actually a section of memory in the kernel used to record information about the kernel that may be impossible to write to disk because the information is generated before the filesystems are loaded. For example, during the boot process, the filesystems are not accessible for writing (most kernels boot with the filesystem in read mode until the system is considered safe enough to switch to read/write mode). The data in this log contains information about the devices connected to the system and any faults and problems recorded by the system during the boot and operational process. On some systems the information is periodically dumped into a file (/var/log/dmesg); on others it is only available by using the alog command (AIX) or dmesg (all other UNIX/Linux variants). The information generated by the kernel is not always written out to another file, such as syslog. This can mean that certain pieces of information, such as internal data on devices and hardware, is only available through the dmesg log. For example, Listing 6 shows some sample output from dmesg on a Gentoo Linux system. Here it is showing the main boot information, trimmed for brevity. Listing 6. The dmesg log contents
Listing 7 shows the output from another machine running Gentoo Linux, and in this example you can see some faults being reported by a running filesystem. Listing 7. Disk error from dmesg
From Listing 7, you can see that you probably need to check the filesystem, as there appears to be a fault on the filesystem or disk. In this instance, the information was also reported in the syslog (Listing 8). Listing 8. Disk error in syslog
But in the case of a serious fault or failure, dmesg can sometimes be your only good source of information on what is happening on your system.
User records (utmp/x, wtmp/x, lastlog) These files contain the user login and system data logs. Information in these files is written in the special utmp format and so you will need special tools to extract the information. The data held within these logs records login times and system startup/shutdown times, both for a historical record of the logins and for quick access to the last boot or login time used during login. See Resources for another article within the System Administration Toolkit that contains information on how to parse these files. The cron time daemon, which is responsible for running many services in the background at periodic intervals, generates its own logs of information. On some systems, the cron log is recorded using syslog, but on Solaris and some traditional UNIX variants, the information is written to the file /var/cron/log. The information contained in the log includes the details of the command executed and when the job started and stopped. For an example of the log contents, see Listing 9. Listing 9. The log of cron activity
Parsing the contents of the log can be an effective way to determine any problems with jobs that don't seem to execute properly. It can also be a good way to check the execution time of a job. Long-running jobs, or jobs that never seem to have finished, probably indicate a problem that should be investigated.
You should make sure that you manage the logs on your systems. Log files can grow very large and in many cases you will want to keep an historical record of events on your machine for problems. For example, a phantom reboot or shutdown of a system should be investigated, and often the system logs are the only source of information. Although it cannot tell you everything that was taking place at the time the failure occurred, you may get information that helps, such as the precise time of the failure, or information about events that led up to the problem. Potential security problems and login attempts may indicate that your machine was being hacked and that may have led to or even been the cause of the problem. Keeping months and months of logs is probably not necessary (although it may under some circumstances be a legal requirement). On a busy system you can easily record 25MB or more information each day to the system logs, and logs are frequently the cause of insufficient disk space errors. Some UNIX/Linux variants include an automatic log management process (Solaris includes the /usr/sbin/logadm command), but it is not that difficult to create your own. A typical arrangement is to keep individual logs for a short period of time (for example, four weeks) and number them sequentially. For example, if you have the file messages, last week's file is in messages.1, the two-week-old file is in messages.2, and so on. This makes migration of the files very easy. You must, however, be careful that you can successfully copy and recreate the file so that you do not lose any significant amount of information in the migration and archiving process. For the old files, to save space, you can also archive the content. Listing 10 shows a simple script that will copy and archive individual files into a suitably named directory within the original location. Listing 10. Simple log-archiving facility
Running the script copies, recreates, and archives the log files. Note how the files are migrated; we just move the current to the week older in each case. Then finally we archive and recreate the original file.
Creating new AD users and groups from AIX Log files can contain a whole wealth of information, but understanding the depth of the information and the format of the files helps immensely when you are trying to diagnose and resolve problems. This article has looked at the basics of log files, their location, and the details of the contents of those files and how they can help you diagnose problems and identify issues before they become problems. The article also examined the format of the different files and the relationships between different files and their contents. Learn
Get products and technologies
Discuss
Original link: http://www.ibm.com/developerwork... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||