Purification
As discussed above at the reporting level your reports are most likely in a basic format, with a lot of information to parse through manually. Maybe `grep` is your most important tool ("find" for you Windows techs, and "inc" for IOS wizards) in combination with magic `awk` one-liners. This means you will spend a lot of time with analyzing your report (maybe) once in a month, and that's all fine if you don't have hundreds of devices logging hundreds of thousands lines per day. But at some stage you will hit a critical level where it just becomes too hard to do this manually. This is where you step into level five, reviewing.
This is the level you need to be at if your trying to comply to regulatory requirements. If not you will spend a lot of time explaining to the auditors why you aren't and what you have done with regards to compensating controls, which usually is harder than to bite the bullet and do this correctly from the beginning. As mentioned earlier you do not want to eye through thousands of lines of logs every day. I would even state that it will be impossible to do that on a daily basis, so you need some help. That help is called normalization of data. What it means is that the logs you bring in and need to analyze are "purified" and funneled into a format that is easier to parse either manually or automatically.
Your environment consists of many different devices, from different vendors. All of these vendors have their own development departments with great minds working towards giving you the ultimate experience with their products. Interestingly enough all them have the best way to log things, and they all have the best format to present data in. So what happens when someone is filling out a form on your web page? Your firewall will give you logs, your router/switch will give you logs, your web server and database will give you logs, and if files are created in an area checked by file integrity monitoring tool, this will also give you logs. So one external event (form submit) will give you at least five logs, all in different formats, but with similar, correlating information. Your log management gadget should be able to normalize this data so all of the logs are shown in the same way. This format will depend on which gadget you choose, but this behavior is preferred. It is also important that the raw logs are maintained "as is", since they have forensic value in case of investigations and/or law suits needing evidence. Only the interpreted (normalized) data should be worked with, and be the base data for your reports and alerts. If you need to dig deeper into nitty gritty details you should of course look into the raw logs.
You may think this is weird science, and it certainly is not for the faint hearted, but this level is actually where the fun begins. If you proceed slowly (yes, it is a mantra) and take in log source after log source, and don't try to take everything in at once, you will end up with clear understanding of what the logs can provide in point of interesting information and at the same time you can filter out the noise which you normally are not interested in seeing. For instance, the form submitted in the example above, you don't want/need to see in your reports, it is considered as normal traffic patterns and belongs in the noise group of information. Something out of the order in submitting that form would be similar traffic but with the submitted data containing XSS attempts or SQL injections. Another example could be a sudden increase in form submitting traffic (unless you just promised free t-shirts in a campaign advertised on major TV networks).
And now you are dangerously close to stepping up to the monitoring level...
Comments
Post a Comment