Extending security intelligence with big data
Presentation by Martin Borrett from IBM.
- Why cyber security as a big data problem
- How the diverse and rapidly changing set of both structured and unstructured data can play a key role in identifying the increasingly sophisticated threats that organisations face.
- Move from reactive to a more proactive stance by actively searching for indicators that something could be amiss.
As an example, the attacks earlier this year on the New York times when it ran a story about China’s prime minister;
- Not detected for 4 months
- 45 different pieces of malware were used, with only 1 being picked up by AV
- All employee passwords stolen
- Computers of 53 employees accessed
- University computers were used as proxies to hide the traffic source.
We have a greater need for security intelligence;
- User identities
- Assest discovery
- Network flow
- Vulnerabilities / risks
- Security and threat feeds
- Baselines of behaviour (system and user)
- Unstructured data such as free text user inputs, feeds from social media, general news sources etc.
Attackers continuously adapting to leave minimal trace and hide their behaviour in the noise of ‘normal’ activity. Due to the potential huge volumes of data, these systems must be very scalable.
Traditionally SIEM type solutions have focussed on real time alerting that is Proactive, Formalised (standard queries / searches) and fast. This is great, but can it be in depth enough, and is real time attesting always required when searching for long term PAT style attacks?
Move towards adding more Asymmetric / Forensic type capabilities that are more Predictive, Inquisitive, and in depth. These require considerably more skill and in depth understanding to create, and the searches will be much more ‘custom’, but this is the best (only?) way to find the subtle and clever attackers, especially if doing so in a timely manner is required (it is!).
Current SIEM type security processes may look like;
This has a heavy focus on structured data and performing real time correlation to get to a potential incident to investigate.
Moving more into the ‘big data’ world we will enrich this with a lot more data sources, much of it unstructured;
This will potentially also take outputs from the traditional SIEM tool as one of the feeds and enrich them with other data. An example may be where something that may be an issue, but where there isn’t enough detail to act on in the SIEM, this could be added to the ‘big data’ solution and correlated with a much wider data set to find out if it could be a real issue.
The top part of the above diagram (Real-time Processing and Security Operations) is relatively similar to existing SIEM solutions, focusing on real time analysis and processing, just with a potentially larger data set.
The bottom right (Big Data Warehouse, Big Data Analytics and Forensics) focuses on the much more advanced, not real time analysis and forensic type investigations.
Context is key.
- You must be able to derive security relevant semantics from elements of the raw data.
- There must be the capability to distil the huge volumes of data down to useful and real insights.
- Human knowledge must ba able to be added to the solution to improve processing and automate more tasks.
Some key security questions a big data analysis solution will help your organisation answer include;
Another key area these tools can help with is in creating visualisations of attacks and suspicious behaviour. As they will have data from all the systems in the enterprise, along with various external feeds, they can provide visual representations of the behaviour as it moves into that through the organisation.
For me the key consideration is to have one ‘Big Data’ solution that collects all the relevant data for your organisation from traditional log files, through corporate emails to social media and threat feeds.
This also needs to move out of the security realm as people are talking ‘Big Data’ but in reality still have the traditional SIEM mindset. Running a tool like this for security, while the ops guys are also running logging and monitoring tools is massively wasteful in terms of cost, storage, management overhead, and also likely results in situations where some useful information only ends up in one tool, not both.
We need to move forwards to the mindset of an Enterprise ‘Big Data’ solution for sorting and correlating All the business data – logs, emails, external sources, user and system behaviours etc.etc. This solution then has different dashboards, reporting solutions, search headers or whatever for the different use cases such as ops, business users (system performance, investigating transaction issues etc.) and ops. Obviously areas like separation of duties and access controls must be considered here, but I believe this type of solution is the only way for this to really succeed and provide the best value for the business.