Monday, March 19, 2012

Penthouse level


Somewhere along the way to this level you realize that all of this data has to reside somewhere. The raw data can be zipped and stored "somewhere else", you rarely need those logs but you probably want to save them for 90 days or so, if regulatory requirements say differently. For instance banking information might be required to keep up to 10 years in some countries. So you need to start thinking about your retention policies regarding different types of data. And you don't have to keep everything on-line either. Store it on backups of some kind, it is cheaper than the terabytes of disk you would need to keep everything on-line. You still have the normalized data to work with, and that is in most cases all you need, if done correctly.

The products you will be working with at this level are very, very powerful and can help you with advanced reporting tools and alerting schemes. They connect to ticketing systems an monitoring suites. They also are able to correlate data from a vast range of devices and software. Usually they come with a bunch of agents/collectors/connectors/daemons and whatnot for various sources of logs with the soul purpose of translating data from a specific device into the normalized format, which can be proprietary or an open format with open API:s so you can write your own log collector agents. Often these products gives you a choice of which approach you can take towards collecting logs. These can be, but not limited to, installing daemons or agents onto a device, triggers in a database, wrappers or listeners, they could be connectors on an appliance or software collectors on a virtual server. The sharp minds in the development departments come up with many interesting solutions, and since you often can write your own agents, you will probably come up with a few new ideas!

Not every log is as easy to read as syslog, where every event is one line. Some fine vendors have decided that if the logs are multiline events they will be easier to read. Yes, if you only use that product, and don't have anything else in your data center. Some log in XML. Remember, these ideas are the best on the market, just ask their developers. With these kinds of logs your normalization agents needs to do magic things. Be sure that you know all your log sources and their formats when shopping for a SIEM/SIM/SEM solution. They need to be able to do this magic or you will lose the correlation part in monitoring. Correlation is only effective if all pieces of the puzzle are available.

Since you've done your homework on the levels below you also have filtered out all the noise by now. So the result in your SOC/NOC monitors are only a few alerts per day, if any. Monthly/weekly/daily reports are mailed out to personnel and management. The bosses have their own webpage which shows red, amber and green lights for systems that are important for their focus area. All logs, from entrance systems to CCTV, Unix boxes to Windows7 clients, routers, switches and WiFi AP's, from New York to Birmingham, through Cape Town, Singapore and Tokyo… OK, a big shop that is. A bit of an exaggeration but you get the drift. You are on top of things, and that's why you earned the penthouse flat.

Sunday, March 18, 2012

Purification


As discussed above at the reporting level your reports are most likely in a basic format, with a lot of information to parse through manually. Maybe `grep` is your most important tool ("find" for you Windows techs, and "inc" for IOS wizards) in combination with magic `awk` one-liners. This means you will spend a lot of time with analyzing your report (maybe) once in a month, and that's all fine if you don't have hundreds of devices logging hundreds of thousands lines per day. But at some stage you will hit a critical level where it just becomes too hard to do this manually. This is where you step into level five, reviewing.

This is the level you need to be at if your trying to comply to regulatory requirements. If not you will spend a lot of time explaining to the auditors why you aren't and what you have done with regards to compensating controls, which usually is harder than to bite the bullet and do this correctly from the beginning. As mentioned earlier you do not want to eye through thousands of lines of logs every day. I would even state that it will be impossible to do that on a daily basis, so you need some help. That help is called normalization of data. What it means is that the logs you bring in and need to analyze are "purified" and funneled into a format that is easier to parse either manually or automatically.

Your environment consists of many different devices, from different vendors. All of these vendors have their own development departments with great minds working towards giving you the ultimate experience with their products. Interestingly enough all them have the best way to log things, and they all have the best format to present data in. So what happens when someone is filling out a form on your web page? Your firewall will give you logs, your router/switch will give you logs, your web server and database will give you logs, and if files are created in an area checked by file integrity monitoring tool, this will also give you logs. So one external event (form submit) will give you at least five logs, all in different formats, but with similar, correlating information. Your log management gadget should be able to normalize this data so all of the logs are shown in the same way. This format will depend on which gadget you choose, but this behavior is preferred. It is also important that the raw logs are maintained "as is", since they have forensic value in case of investigations and/or law suits needing evidence. Only the interpreted (normalized) data should be worked with, and be the base data for your reports and alerts. If you need to dig deeper into nitty gritty details you should of course look into the raw logs.

You may think this is weird science, and it certainly is not for the faint hearted, but this level is actually where the fun begins. If you proceed slowly (yes, it is a mantra) and take in log source after log source, and don't try to take everything in at once, you will end up with clear understanding of what the logs can provide in point of interesting information and at the same time you can filter out the noise which you normally are not interested in seeing. For instance, the form submitted in the example above, you don't want/need to see in your reports, it is considered as normal traffic patterns and belongs in the noise group of information. Something out of the order in submitting that form would be similar traffic but with the submitted data containing XSS attempts or SQL injections. Another example could be a sudden increase in form submitting traffic (unless you just promised free t-shirts in a campaign advertised on major TV networks).

And now you are dangerously close to stepping up to the monitoring level...

Saturday, March 17, 2012

What if?


Then one morning shit hits the fan, and it's all over the place. Something goes terribly wrong and production makes a distinct halt. It takes ages to file a new order. The Intranet page just won't load. Local shares won't mount. Remote Desktop Connections time out. SSH does not work… OK, this is not good, by any measures. And yes, this is an exaggeration, you probably won't see the above happening all at once unless someone blew up your data centers or you've had a very, very malicious attack where someone erased the lot.

In one way or another you have to access your devices, be it through ILO, KVM or good old RS-232, but some serious digging into your logs (that are hopefully collected) has to be done. Following your Incident response plan (you have one, right?) you call in experts in several fields to help you chase traces of faults and translate various logs from all your systems. These experts might be in-house specialists, or you could buy this expertise from consultants. The point being, you just won't cope with logging in to all of your boxes in reasonable time and look for suspicious lines of logged events, even if the amount of devices are counted in tens. So get some help.

By now you are at the investigation level. The likelihood of missing logs, or logs not containing enough data, is probably closer to 1 than 0. You decide that logging would have been a nice thing to have, and start configuring devices and applications to log, at least locally, and you are slowly starting to embrace the thought of collecting the logs at some kind of central repository. Let's be straight and honest - log management is a must have. It doesn't matter how you do it, but you need to configure logging on your systems. If you can save them centrally investigations will be so much easier.

With that in mind, and a serious incident that cost the company a lot of money which gives you a very good incentive, you go ahead and ask for money to invest in some kind of log collecting gadget, which also has rudimentary capabilities to parse the log data automatically. Of course there is a possibility that the incident wasn't serious enough for your investment board to consider investments, or your boss is reluctant to buy something that will jeopardize his budget (which of course will affect his bonus). If that is the case there are options that are free to download. Some are Open Source, and some vendors offer free versions of their commercial programs with some limitations in volume, functionality is usually the same as the commercial product. Welcome to level four, reporting!

Friday, March 16, 2012

Why do you need logs?


Logs are needed for different reasons. Your company may be a shop of developers, so you need a repository for your code, who checks out and who checks in part of the code, which ever version control system you use it needs to keep track of the different versions of code the developers produce. Look at it as a log system within the application. A competent system like that most definitely has logging capabilities (if not, throw it out now), so that is easy. Who did what at which point of time? 

That application needs to log who checked out which portions of code at which point of time, since if something out of the ordinary would happen, like a buffer overflow in the Linux kernel, it would be easy to see exactly when the faulty code was introduced and by whom. The "who" part is (hopefully) not there for blame reasons, but for educational as in "we all learn from errors". Still, the main reason for logging, and preserving the logs, is traceability.

The example above can be seen as a "Use Case". Use cases are a good way to understand where and what your needs are. What do you need to record, and on which grounds? Use cases can consist on almost anything, but the most important issue here is that the case will give your business something of value. Something that can be applied onto your particular situation that needs to be followed up on a regular basis. This could be something in way of (PCI-DSS) "where does my e-commerce data reside and flow?" (i.e. credit card numbers), or "who created an admin user in database x?", or "how come a user logged in from three different sites within five minutes?". 

For some reason you need logs from parts of your environment and you need to work out a few good use cases to have as a starting point. Once that is done you have a picture of what you want to see in a report. The next step is to look at your logs. Is there enough information in your logs to achieve what you need? If there is - fine. If there isn't - uh-oh… Don't despair, there is usually a solution. 

Let's look at syslog, built in to all Unix-like systems, routers, switches, firewalls and a lot of other devices. It comes with different levels of logging (0-7) from/to different facilities, and can be logged locally or remotely. Usually level INFO (6) can be considered as a sufficient level of logging (only debug is creating more output, sometimes a ridiculous amount of output) for investigations. There are several products for the Windows platform that converts event logs to syslog and sends them on to a central (or several) syslog repository where the logs can be saved and on demand be parsed at will.

Most applications have logging built-in in some portion. It is up to the vendor to decide what is logged, to which level, and whether this level can be configured or not. That is a shame. First and foremost it is up to the user to decide what needs to be logged, not the vendor. But as this is not the case you need to either accept it, file a bug report or take radical decisions as in throwing out the product or wrap it up in some kind of sandbox, which can take care of logging the input and output. The problem with "accept it" is that the product in scope probably will not meet the requirements that you, your organization nor the auditor have, and one of those may have a different opinion on the matter.

If you need logging to the extreme Big Brother experience where you log everything from an employee using the card to get into the premises, passing the cafeteria area, pouring a cup of double espresso, entering the work area, logging into his account, opening [x,y,z] applications, surfing to [a,b,c…] doing this and that until lunch, locking the computer screen, going for lunch… you see where this is going. You will probably end up in throwing out a lot of your existing software to comply with your needs. It's doable, but does it justify the cost?

This leads you to level 2, Collection; you are collecting logs, but you're really not looking at the logs. Ever. But you're collecting logs, which is a good thing in itself. So, Good On You!

Thursday, March 15, 2012

Choosing Mr. Right


As you can imagine from the above, whatever products you choose, do not trust a vendor that promises "Plug it in and you will be up and running in ten minutes!". This will not happen. You will need to tweak and configure until your fingers bleed, your eyes are blood-shot from crying, and your vocabulary is out of new and innovative cursing as well as the old and proven four-letter words before you feel things are under control. Sometimes that feelings is good, as in: you're on the right track. The rest is just a false prediction of what is ahead of your adventure with log collection and reporting.

Now, don't let these words put you off. Once you've overcome the first hurdles and caveats you might experience the beauty and necessity of logs. Sounds crazy, but when you understand the heartbeats and almost organic life that is going on in your data centers you will see the benefits of your correlated and filtered data logs, and the advantage it gives your organization in reacting almost proactively to those alerts and reports you are able to squeeze out of those hundreds of millions events that might occur during a normal day at an average site with a few active databases.

Your mission is to choose the right product(s) for your needs. So you need to pin out your needs first. In cases of regulatory requirements it's pretty straight forward. Take PCI-DSS (I'll use that as an example since I am very familiar with those requirements after working for years with them), the requirements (PCI-DSS 2.0) clearly states that you need to collect logs and see to it that they can't be tampered with. There are also requirements of how often logs are to be reviewed. So you need to decide to what extent you want to obey, or follow, these requirements. As you will see later, maybe it is a good idea to look beyond the requirements and not only to comply, but also look at what good will the requirements do for you and what advantage you can gain from using them as an instrument in your daily work environment.

Let's start with your requirements. Don't worry about other's at this stage. This is your mission and your requirements comes first, and only later we'll take other requirements into consideration. It might just be that they walk hand in hand, or needs just a little tweak or persuasion so they will tango. 

The main mantra in logging is:

Who did what at which point of time? Who did what at which point of time? Who did what at which point of time? 

Is that clear enough? In one word: Traceability.

Wednesday, March 14, 2012

Summary


See the above descriptions as examples, there are as many sub-levels as there are implementations and interpretations of needs and requirements. The examples are nonetheless taken from real life and compiled into short descriptions of what it can look like and gives you pointers to what you need to achieve depending on which confidence level you want to achieve. The examples do not give any hints or recommendations for products you may or may not need since it is a very fast moving market and acquisitions, mergers and trends move too rapidly for this document. You'll need to do some research to find out what is suitable for you and what the different vendors offer today. There are several and different aspects you need to keep track of when comparing products. You need to choose something that is right for your organization, and hopefully you'll find something that suits your needs and has room and capabilities to scale and expand with your company's strategies and goals. 

Tuesday, March 13, 2012

6. Log monitoring


Punch that stick into gear number six! Now we're talking business. This is the level where you know what kind of environment you're living in, you know which logs are being collected, you know they are the right ones, and you know your filters work. You probably have alerts and alarms in place. Correlation of logs from different devices are in place and dashboards blink and beep at your 24/7 SOC. Your data centers are equipped with log collectors which are filtered into easily translated dashboards or lists with relevant information displayed on screens in the SOC. Events that are important, but not enough to be displayed on those screens, result in alerts that shows up in your monitoring software, and maybe even ticketing system. In some cases everything is connected to your central configuration management database. Backups are stored off site in a tamper proof environment. Reports are easily retrievable in case of need. Some of them are automatically sent to your management team and others to various technical teams. When auditors come in you press a button and give them your compliance report. If something out of order happens your SOC alerts on call personnel to solve problems that are not possible for the NOC to straighten out.