I might be putting the horse before the cart here, I had intended on posting about Sumo Logic before writing this, but I feel that this particular post is probably more useful immediately. You can find information on setting up ephmeral mode in the SumoLogic support pages, but when I was looking I found it wasn’t in a single location. That got frustrating quickly, so when I did work out how it needed to be set up, I thought that I’d post it so that others didn’t need to spend time on it.
Quickly, Sumo Logic is a cloud based event log/correlation and analytics service – think of it as a cloud based Splunk, but with less ridiculous pricing. Someone will surely point out SplunkStorm, but that isn’t really all that good in my experience. I’d suggest you try both out and you’ll work out why pretty quickly. The company is relatively new, but has some pretty decent backers in the form of Accel Partners and Sequoia Capital (amongst others), both of whom have reasonably decent reputations in the venture capital space. Technology wise, Sumo Logic specialises in cloud based, large data set event log management and analysis. You can feed it a ton of data, query it and generate usable information from it that can help with operational support of applications or environments. Its primary benefit is that its rapidly scalable out to very very large data sets and the fact that its search is near real time.
Sumo Logic operates with hosted and installed collectors. The hosted collector allows it to integrate into Amazon S3 to fetch log data for ELB, CloudTrail and the like. This allows you to process that data without having to deploy an EC2 instance to collect that data for you and forward it into Sumo Logic’s systems. The installed collector is just that, its a collector that’s installed on a system. You’d use an installed collector when you’re fetching data from Windows Event logs, IIS logs, or even acting as a syslog collector for network devices. The installed collector then relays the collated data back to Sumo Logic over secure connection. In the configuration of your Sumo Logic account, you see all your collectors, be it hosted or installed. I’ll post more about it another time, but for now I’d suggest you have a play.
Now that we’ve covered what Sumo Logic is and a high level idea of how it collects data, what is ephemeral mode? Ephemeral mode is as the definition of the word – “lasting for a very short time”. Its a configuration setting to allow for Sumo Logic to receive data from a collector that isn’t permanent and will need to be expired or cleaned up once it stops working. Basically the collector will start up at the time the machine is started, collect data and send it back to Sumo Logic. On machine termination, rather than have an offline collector visible in the configuration panel, it will automatically remove itself after 12 hours of not receiving any messages.
In my case, I wanted to deploy it in an Amazon EC2 environment where I was using AutoScale to ramp up the number of instances processing data and writing to a log file for each job completed. I could have centralised the logging data to a network share, but I found I was running into instances where it would not complete its write and I’d end up losing some data. This meant that I couldn’t track the progress and performance of job processing as accurately as I’d like, which in turn made it harder for me to perform proper capacity management. This is a relatively simple use case, but imagine if you had a front end web farm that was joined to AD, and you had servers being spun up and destroyed based on health or performance requirements. You might want to track all the event log data and IIS logs, but you’d be hard pressed to do so easily unless you had some tool to collect all the data for you and forward it on. For the purposes of this article, I’ll use this as my example.
Setting Ephemeral mode
Technically speaking, the information to set up Ephemeral mode is pretty easy to find. The problem is, it doesn’t tell you the additional steps involved in the actual configuration. As it turns out, you need to do a few things:
- Create a folder on C drive, called “sumo”
- Create a file called “sources.json” with the data you want collected, save it in C:\sumo
- Create a file called “sumo.conf” with the relevant configuration, save it in C:\sumo
- Install the Sumo Collector with the -q switch
The first item is pretty simple. Create a folder on your C drive called “sumo”. If you’re having problems with this step, I can’t help you.
I got lazy here, couldn’t be bothered taking the screenshot from a test system, so I did it from a live environment and edited the image to redact certain folders.
Creating a sources.js is pretty straight forward, but be aware you need to escape the slashes in paths on Windows systems. As per the example we have to collect Event log and IIS log data, so we need to create a sources.json to reflect this:
"defaultDateFormat": "dd/MMM/yyyy HH:mm:ss"
"defaultDateFormat": "dd/MMM/yyyy HH:mm:ss"
Note the escaped path for the pathExpression? That’s important, else you won’t be getting IIS logs coming through Also note, you need to set your categories in the configuration as well. Not doing so will make it harder to search by data type, and reduce your ability to effectively use Sumo Logic.
Now we need to create the sumo.conf file:
Hey look, ephemeral is equal to true! This is where ephemeral mode is set. I forgot to mention earlier, you can install collectors and tell them to use access id/key pairs to authenticate (I’ve typically broken up my access id/keys based on server types, more for easier house keeping than anything else) as opposed to your login – its a little more secure to use id/key pairs than your email login, so I’m going to suggest you do that. Again, notice the escaped path for sources.json, if you don’t do this you won’t get your configured data sources.
The last bit to all of this is running the collector with the -q switch. When creating a AutoScale Launch Configuration, I use the User Data field to pass through a command to run a PowerShell script by wrapping it with <powershell></powershell>. The PowerShell script calls “C:\Bootstrap\Executables\SumoCollector_windows-x64_19_95-10.exe -q”. The executable runs and installs with the default options. On first start, it reads “sumo.conf” from “C:\sumo”, setting the basic configuration of ephemeral mode, and providing it with access credentials and a path to the relevant sources in the form of “sources.json”. Upon doing that, it starts collecting data and funnelling it up to Sumo Logic. This is admittedly very AWS EC2 focused, but the commands are adaptable to any other platform that you can imagine.