Getting used to the Knowledge Base
This section is important. You should not enlarge your configuration until you know how to handle the Knowledge base. To be able to handle the Knowledge base you have to understand the following three points:
- how is the Knowledge base organized
- how do I create a filter for state messages
- how do I tell the server that I change the Knowledge base
Organization of the Knowledge base
When you start gossips the first time it builds the Knowledge Base directories. The Knowledge base is the collection of all system states that occurred on your monitored hosts. It contains folders for each installed test. Gossips stores all treated problems meaning that you added a solution to the problem. An additional folder is called 'jobs'. It contains the generated Knowledge base files for each new state that is found by your monitor clients. Let's take a look at the Knowledge base structure:
.
|-- KNOWLEDGE_BASE/ Knowledge Base
| |-- jobs/
| | |-- archive/
| | |-- Error-2001_10_25-14_48_44
| | `-- Error-2001_10_25-14_48_44_log
| |-- Test_Disks/
| | |-- archive/
| | |-- Error-2001_03_21-10_52_48
| | `-- Error-2001_03_21-10_52_48_log
| |-- Test_Ping/
| | `-- archive/
| |-- ...
|
|-- LOG/ Log files of the daemons
| |-- archive/
| |-- SERVER.log
| |-- SERVER.error
| |-- drwho.log
| |-- drwho.error
| |-- ...
|-- bin/ executables (gossips daemon and initd-scripts)
|-- doc/ documentation
|-- lib/ liberaries and modules
`...
The Knowledge Base files (Error-2001_10_25-14_48_44) are in ASCII format. Thus you may 'edit', 'grep' or 'create' them with your own tools. But I suggest to use the command-line interface to handle the Knowledge Base. The command-line interface is called 'gossipc'. It has several functions to modify or test your gossips environment. Read more about the usage of the command-line interface with 'gossipc --man'.
During runtime gossips creates 'Error-files' and stores them in the 'jobs' folder. At this point you are able to bring-in the knowledge about your monitored environment.
This is an example of an error file:
*** Test_DiskS ***
Configuration = sun
RegEx = less than 50 M on disk /usr/home-a
Priority = 0
Transfer = mail
*** Problem-Description ***
*** Problem-Solution ***
An error file has three sections.
- Information on the occurred error. ('Configuration', 'RegEx', 'Priority', 'Transfer')
- Space for an 'Problem-Description'
- Space for an 'Problem-Solution'
The 'Configuration'-field tells you in which group the error occurred, meaning the group of the gossips client that sent the message. The 'RegEx' field contains the message that was sent form the client. The 'Priority' will be explained later. The 'Transfer'-mode sets the status of notification.
In the 'Problem-Description' and the 'Problem-Solution' you can add your solutions to the problems and will receive the solutions as the problem shows up the next time within the notification message. Use 'gossipc -edit Error-2001_10_25-14_48_44' to edit the Knowledge base file.
How to handle the Knowledge Base
Some recommendation to minimize the number of ERROR-files in the Knowledge Base and to use it efficiently.
'jobs' - Folder
The 'jobs' folder contains all new ERROR-files of the Knowledge Base. Gossips creates all its Knowledge Base files in this folder. In the file system extract above you can see that the Knowledge Base has a folder for each Test. If you discussed an ERROR-file in the 'jobs'-folder you can move it to the corresponding test-folder by executing 'gossipc -move ERROR-2001-20-08_14-28-02'.
Create RegEx ERROR-file
If you receive a 'positive'-message, as for example 'tardis is alive', you should define a ERROR-file which catches similar messages. A regular expression to catch all positive pings would look somehow like: '[\w_\._]+ is alive'. As second step you have to tell gossips that it should use this ERROR-File for all ping-group of the test.cfg file. Otherwise gossips won't find this new ERROR-file for all hosts. Example,
*** Test_Ping ***
Configuration = ping_a|ping_b|ping_server
RegEx = [\w_\.-] is alive
Priority = 0
Transfer = no
*** Problem-Description ***
*** Problem-Solution ***
Transfer mode
Use 'Transfer'-mode 'no' to tell gossips to be silent when this Knowledge Base message matches for a client message. Use 'email' to receive an email with the corresponding message.
New Knowledge Base Files
After you created new Knowledge Base files you should remove all those files that match for similar messages. Gossips will tell you if it found more than one ERROR-file for a certain client message.
Merging two Knowledge Base Files
You can also use the 'merge'-command of the command-line interface to merge two ERROR-files that match the same client message. Try 'gossipc -merge Test_DiskS ERROR-FILE1 ERROR-FILE2'.
Special Knowledge Base File
You like to specify an ERROR-file for a particular hosts, meaning you would like to describe a certain Knowledge Base entry for a special host. The first thing that you have to do is to isolate the hosts configuration in the test.cfg file by assigning the test parameter to a new group. An example,
*** Test_DiskS ***
+sun
disk = default::400M::30min
+cdburn_host
disk = scratch::200M::30min
Now you can create a Knowledge Base file that match to your desired client message. An example,
*** Test_Ping ***
Configuration = cdburn_host
RegEx = less than \d+ M on disk /scratch
Priority = 0
Transfer = mail
*** Problem-Description ***
the disk '/scratch' on the cd burn host is filling up.
*** Problem-Solution ***
clean up the '/scratch' partition
|