I attended the Splunk Live! London event last Thursday. I am currently in the process of assessing Splunk and it’s suitability as a security SIEM (Security Information and Event Management) tool in addition to general data collection and correlation tool. During the day I made various notes that I thought I would share, I’ll warn you up front that these are relatively unformatted as they were just taken during the talks on the day.
Before I cover off the day, I should highlight that I use the term SIEM to relate to the process of Security Information and Event Management, NOT SIEM ‘tools’. Most traditional tools labelled as SIEM as inflexible, do not scale in this world of ‘big data’ and are only usable by the security team. This for me is a huge issue and waste of resources. SIEM as a process is performed by security teams every day and will continue to be performed even when using whatever big data tool of choice.
The background to my investigating Splunk is that I believe a business should have a single log and data collection and correlation system that gets literally everything from applications to servers to networking equipement to security tools logs / events etc. This then means that everyone from Ops to application support, to the business to security can use the same tool and be ensured a view encompassing the entire environment. Each set of users would have different access rights and custom dashboards in order for them to perform their roles.
From a security perspective this is the only way to ensure the complete view that is required to look for anomalies and detect intelligent APT (Advanced Persistent Threat) type attacks.
Having a single tool also has obvious efficiency, management and economies of scale benefits over trying to run multiple largely overlapping tools.
Onto the notes from the day;
Volume – Velocity – Variety – Variability = Big Data
Machine generated data is one of the fastest growing, most complex and most valuable segments of big data..
Real time business insights
Search and investigation
Enables move from ‘break fix’ to real time operations insight (including security operations).
GUI to create dashboards – write quires and select how to have them displayed (list, graph, pie chart etc.) can move things around on dashboard with drag and drop.
Dev tools – REST API, SDKs in multiple languages.
More data in = more value.
My key goal for the organisation – One log management / correlation solution – ALL data. Ops (apps, inf, networks etc.) and Security (inc PCI) all use same tool with different dashboards / screens and where required different underlying permissions.
Many screens and dashboards available free (some like PCI and Security cost) dashboards look and feel helps users feel at home and get started quickly – e.g. VM dashboards look and feel similar to VMware interface.
another example – windows dashboard – created by windows admins, not splunk – all the details they think you need.
Exchange dashboard – includes many exchange details around message rates and volumes etc, also includes things like outbound email reputation
VMware – can go down to specific guests and resource use, as well as host details. (file use, CPU use, men use etc.)
Can pivot between data from VMware and email etc. to troubleshoot the cause of issues.
These are free – download from spunkbase
Can all be edited if not exactly what you need, but are at least a great start..
Developers – from tool to platform – can both support development environments and be used to help teach developers how to create more useful log file data.
Security and Compliance – threat levels growing exponentially – cloud, big data, mobile etc. – the unknown is what is dangerous – move from known threats to unknown threats..
Wired – the internet of things has arrived, and so have massive security threats
Security operations centre, Security analytics, security managers and execs
- Enterprise Security App – security posture, incident review, access, endpoint, network, identity, audit, resources..
Look for anomalies -things someone / something has not done before
- can do things like create tasks, take ownership of tasks, report progress etc.
- When drilling down on issues has contextual pivot points – e.g right click on a host name and asset search, google search, drill down into more details etc.
- Even though costs, like all dashboards is completely configurable.
Splunk App for PCI compliance – Continuous real time monitoring of PCI compliance posture, Support for all PCI requirements (12 areas), State of PCI compliance over time, Instant visibility on compliance status – traffic lights for each area – click to drill down to details.
- Security prioritisation of in scoop assets
- Removes much of the manual work from PCI audits / reporting
Application management dashboard
- spunk can do math – what is average stock price / how many users on web site in last 15 minutes etc.
- Real time reporting on impact of marketing emails / product launches and changes etc.
- for WP – reporting on transaction times, points of latency etc – enable focus on slow or resource intensive processes!
- hours / days / weeks to create whole new dashboards, not months.
Links with Google earth – can show all customer locations on a map – are we getting connections from locations we don’t support, where / what are our busiest connections / regions.
Industrial data and the internet of things; airlines, medical informatics (electronic health records – mobile, wireless, digital, available anywhere to the right people – were used to putting pads down, so didn’t get charged – spunk identified this).
Small data, big data problem (e.g. not all big data is a actually a massive data volume, but may be complex, rapidly changing, difficult to understand and correlate between multiple disparate systems).
Barclays – 10TB security data year.
HPC – 10TB day
Trading 10TB day
VM – >10TB year
All via splunk..
DataShift – Social networking ‘ETL’ with spunk. ~10TB new data today
Afternoon sessions – Advanced(isn) spunk..
– Can create lookup / conversion tables so log data can be turned into readable data (e.g. HTTP error codes read as page not found etc. rather than a number) This can either be automatic, or as a reference table you pipe logs through when searching.
– As well as GUI for editing dashboards, you can also directly edit the underlying XML
– Can have lots of saved searches, should organise them into headings or dashboards by use / application or similar for ease of use.
– Simple and advanced XML – simple has menus, drop downs, drag and drop etc. Advanced required you to write XML, but is more powerful. Advice is to start in simple XML, get layout, pictures etc sorted, then convert to advanced XML if any more advanced features are require.
– Doughnut chart – like a pie chart with inside and outside layers – good if you have a high level grouping, and a lower level grouping – can have both on one chart.
– Can do a rolling, constantly updating dashboard – built in real time option to refresh / show figures for every xx minutes.
- replicate indexes
- gives HA, gives fidelity, may speed up searches
Advanced admin course;
- can accelerate a qualifying report – more efficiently run large reports covering wide date ranges
- must be in smart or fast mode
Lots of free and up to date training is available via the Splunk website.
Splunk for security
Investigation / forensics – Correlation, fast to root cause, look for APTs, investigate and understand false positives
Splunk can have all original data – use as your SIEM – rather than just sending a subset of data to your SIEM
Unknown threats – APT / malicious insider
- “normal” user and machine data – includes “unknown” threats
- “security” data or alerts from security products etc. “known” security issues.. Misses many issues
Add context – increases value and chance of detecting threats. Business understanding and context are key to increasing value.
Get both host and network based data to have best chance of detecting attacks
Identify threat activity
- what is the modus operandi
- who / what are most critical people and data assets
- what patterns and correlations of ‘weak’ signals in normal IT activities would represent abnormal activity?
- what in my environment is different / new / changed
- what deviations are there from the norm
Sample fingerprints of an Advanced Threat.
Remediate and Automate
- Where else do I see the indicators of compromise
- Remediate infected systems
- Fix weaknesses, including employee education
- Turn the Indicators of Compromise into real time search to detect future threats
– Splunk Enterprise Security (2.4 released next week – 20 something april)
– Predefined normalisation and correlation, extensible and customisable
– F5, Juniper, Cisco, Fireeye etc all partners and integrated well into Splunk.
Move away from talking about security events to all events – especially with advanced threats, any event can be a security event..
I have a further meeting with some of the Splunk security specialists tomorrow so will provide a further update later.
Overall Splunk seems to tick a lot of boxes and looks certainly taps into the explosion of data we must correlate and understand in order to maintain our environment and spot subtle, intelligent security threats.