Incident Management and Report Document

Incident Management and
Report Document

Cloud IT Security Incident Management Overview with Sample Incident Report

The goal of IT Security Incident Management is to get IT services back up and operating as quickly as possible. An IT Security Incident Management process should include automated detection and remediation tools, a clear workflow process, and detailed incident reports. IT Service Management (ITSM) has been around for a while but with the modern world of cloud computing, there are a few new tools that need to be incorporated.

For both cloud and on-prem environments, an IT Security Incident Report is a vital record that tracks the details of specific incident occurrences. It is used by software development, cybersecurity operations, cloud security, customer service, compliance and risk management organizations.

An effective incident report includes details, such as what happened, when it happened, the nature of the occurrence, the business impact, and so on. A clear and detailed incident report is a key resource to analyze problems, identify solutions, and improve service delivery. There is a sample blank incident report form at the end of this document that you can use as a resource.

Types of Incidents

Three distinct types of incidents occur in IT infrastructure — major incidents, repeated incidents, and complex incidents.

Major incidents: Large-scale incidents such as application failures, cyber attacks, and so on don’t happen very often. However, when they do, they can cause disruptions in businesses, which can have a significant impact on the business. Businesses must be ready to deal with major incidents swiftly and effectively.
Repetitive Incidents: Misconfiguring IT devices or applications can lead to a recurrence of incidents. Escalating and prioritizing the problem is the best way to deal with it. Such incidents may or may not have a significant impact, but they will accumulate and will require time to resolve.
Complex Incidents: The majority of the incidents that occur are of a medium or low severity, which are fairly simple to resolve. However, occasionally, a critical or high level incident will occur that requires an immediate response.

Incidents are often categorized based on severity, such as critical, high, medium, and low or L1, L2, L3. Severity ratings are connected to the overall business impact and urgency of the issue.

Automated Incident Response Tools

Deploy tools to automate and assist with detection, analysis, and remediation of security problems. For modern cloud environments these tools would include:

Cloud Security Posture Management (CSPM) to continuously detect misconfigurations and regulatory compliance issues
Cloud Workload Protection Platform (CWPP) to continuously monitor workloads for vulnerabilities, malware, etc.
Cloud Data Loss Prevention (DLP) and Data Security Posture Management (DSPM) to classify and protect confidential data
Network and traffic analysis with microsegmentation to limit the lateral movement of an attack or confidential data
Cloud-Native Application Protection Platform (CNAPP), a combination of CSPM and CWPP
Anti-malware, Extended Detection and Response (XDR), to detect threats and attacks
Security Information and Event Management (SIEM), to correlate security information and events from multiple sources
Security Orchestration, Automation, and Response (SOAR), to coordinate and execute tasks between people and tools
Firewall, intrusion prevention, and denial of service (DoS), to block attacks at the
perimeter

Incident Management Workflow

Incident reporting and management is required and should be prioritized. It should be a well-organized approach for the organization to use to respond to non-compliance, system failures, incidents, accidents, cyber-attacks, outages, and breaches, among other things.

Be prepared before an incident occurs. Roles and responsibilities should be clearly defined based on skills and requirements. It must be clear who is responsible for what actions in the case of an incident. Employees need to be trained on the incident management workflow.

An incident management workflow should include:

Identify the incident and assign severity based on impact and urgency
Notify all impacted stakeholders
Assigning tasks to the correct individuals
Tracking the incident from detection to resolution and closure
Escalation (if required) for breach of SLAs
Generate reports and documentation for investigation and root cause analysis
Resolution and remediation
Closure

*Note: You should have a no-approval process for resolving incidents

Communication with stakeholders is essential. Stakeholders are not only members of the IT and Risk and Compliance organizations, stakeholders also include employees, customers, partners, and regulators. Regularly share announcements, notifications, and status updates as appropriate with all stakeholders. Do not rely solely on email, often this requires phone calls and meetings.

Essential Elements of an Incident Report

The person experiencing the problem or the team supporting related operations creates an incident report with input from others involved in resolving it.

A brief report is appropriate if the occurrence is minor and has little impact. However, if the incident is serious, all information must be recorded while preparing an extensive major incident report. The person creating the incident report uses input from others involved in resolving the incident.

Key information to include

Name of the individual who prepared the incident report
Correct date and time the incident was detected and resolved
Identification of which services were affected and/or unavailable
A concise and relevant description of the incident with the actual details
Whether or not any SLAs (service level agreements) were breached, and if the incident requires a penalty or escalation
Compliance requirements
Document all remediation actions and troubleshooting methods
Record any business impacting details including outages, investigations, publicity, customer remediation, regulatory, etc.

Incident Report DOs

Describe relevant issues in detail
Focus on the facts
Assume that the report will be made public (even if it is initially confidential)
Make sound professional decisions based on severity and business impact
Consider mitigating circumstances
Determine if the incident indicates a security breach has occurred
Developed a feasible remediation strategy
Understand your incident reporting policy and procedure
Use professional terms and write legibly
Include the names and addresses of any person (employees, contractors, partners, customers, etc.) who are aware of the situation

Incident Report DON’Ts

Speculate and assume without knowing the facts
Discuss previous similar occurrences
Talk about money, costs, and spending decisions
Discuss failures or delays in acting
Predict what will happen if nothing is done
Include insignificant facts that implicate or blame anyone for the occurrence

Sample Incident Report

Incident response teams should make an initial incident report and then continue to report updates and additional information to their Chain of Command/Security Specialist (as collected).

1. Contact
Full name
Job title
Division or office
Work phone
Mobile phone
Email address
Additional contact information:

2. Issue (check all that apply)
Account Compromise (lost password, suspicious account behavior,…)	Social Engineering (phishing, scams)
Denial-of-Service (including distributed)	Technical Vulnerability
Malware (virus, worm, trojan, crypto,…)	Theft/Loss of equipment or media
Misuse of Systems (acceptable use)	Unauthorized Access (cloud, systems, applications, storage, devices
Reconnaissance (scanning, probing)	Data Exposure (public access, public share, breach of sensitive data)
Open Port	Misconfigurations (exposed secrets, default passwords, risky settings,…
Description of incident:

3. Severity and Scope (check all that apply)
Critical (affects system-wide information resources)
High (entire network, cloud, or critical business systems)
Medium (affects infrastructure, network, cloud, servers, or admin accounts)
Low (only affects workstations or user accounts)
Unknown/Other (please describe below)
NOTE: All incidents deemed critical or high require additional notification by phone
Estimated quantity of assets affected
Estimated quantity of users affected
Third parties involved or affected (vendors, contractors, partners)
Additional information:

4. Impact (check all that apply)
Loss of Access to Services	Propagation (other regions, segments, assets, partners, customers…)
Loss of Productivity	Unauthorized Disclosure of Information
Loss of Reputation	Unauthorized Modification of Information
Loss of Revenue	Unknown/Other (please describe below)
Additional Impact Information:

5. Sensitivity of Affected Data/Information (check all that apply)
Critical Information	Personally Identifiable Information (PII)
Non-Critical Information	Intellectual/Copyrighted Information
Publicly Available Information	Secrets (critical infrastructure/key resources)
Financial Information	Protected Healthcare Information (PHI)
Payment Card Information (PCI)	Unknown/Other (please describe below)
Data encrypted?
Location of data (bucket/blob, file, queue, attached volume, persistent volume, network segment)
Quantity of data affected (number of records, files, accounts, locations, assets…)
Additional affected data information:

6. Systems Affected (provide as much detail as possible)
Attack Sources (IP address, port,...)
Attack Destinations (IP address, port,…)
IP Addresses
Domain Names
Primary Functions of Affected Systems (web server, domain controller,…)
Operating Systems of Affected Systems (version, service pack, configuration,…)
Patch Level of Affected Systems (latest patches loaded, hotfixes,…)
Security Software on Affected Systems (anti-malware, firewall, versions, date of latest update,...)
Affected Systems/Assets (cloud platform, region, account, security group, asset ID,... )
Additional System Details:

7. Users Affected (provide as much detail as possible)
Names and Job Titles
System Access Levels or Rights of Affected Users (e.g., regular user, domain administrator, root)
Additional User Details:

8. Timeline (provide as much detail as possible)
Date and time when security officer first detected, discovered, or was notified about the incident
Date and time when the actual incident occurred (estimate if exact date and time unknow)
Date and time when the incident was contained or when all affected systems were restored (use most recent date and time) Elapsed time between when the incident occurred and when it was discovered Elapsed time between when the incident was discovered and the incident was contained or all affected systems were restored
Detailed Incident Timeline:

9. Remediation (provide as much detail as possible)
Actions taken to identify affected resources
Actions taken to remediate incident
Actions planned to prevent similar future incidents
Additional Remediation Details:

In Conclusion

Incidents happen and every incident is an opportunity to learn and improve. Don’t play the blame game, it must be safe for employees to report incidents when they are detected. You do not want team members hiding incidents to avoid negative repercussions. Clear documentation and a postmortem analysis of an incident is a valuable tool to improve team execution. If you always close out an incident with a review of what you have learned and how you can improve, you’ll perform better when faced with similar issues in the future.

View More Resources

About
Microsec.AI

Get easy-to-deploy, runtime visibility, protection, and compliance monitoring for cloud serverless, VM, and Kubernetes environments. Microsec.AI is the only agentless, data-centric, runtime cloud-native application protection platform (CNAPP) that protects your data and applications with data loss prevention (DLP), east-west network traffic control with self-healing micro segmentation, security posture management, and compliance analysis in one unified solution.