Challenges in Scoring Application Security Test Findings

A simplified description of application security testing (or penetration testing) is a task that involves identifying application vulnerabilities and reporting how they were identified such that they can be replicated and ultimately remediated. I would like to complicate (refine) this definition to discuss two very important missing components:

Comprehensive security testing should identify more than vulnerabilities. It should also uncover and communicate weaknesses and other risks.
An essential step in the documentation of findings is contextualizing the assessed risk posed by each finding, facilitating prioritization by application teams.

The distinction between vulnerabilities and weaknesses is often non-trivial, but generally there is a consideration of exploitability. The security software organization snyk defines weaknesses as potential vulnerabilities, which I would generally agree with, but I think there are some notable exceptions.

An interesting example to consider is the presence or absence of a Content Security Policy (CSP) as defined by the HTTP response header Content-Security-Policy. The CSP mechanism is predominantly (not exclusively) intended as a mitigation of cross-site scripting (XSS) vulnerabilities, but, for this purpose, it is not a primary mitigation. XSS vulnerabilities should be corrected by securely handling untrusted data rather than relying on a CSP, which effectively acts as a defense in depth mechanism that can be bypassed with the right conditions and configuration.

By this framing, a missing CSP is not a weakness at all according to our working definition from snyk. XSS vulnerabilities arise from insecure handling of untrusted data; therefore, it is not the absence of a CSP that results in XSS. A missing CSP is simply not a potential vulnerability, but this does not mean that implementation of a CSP should not be a priority. In fact, there are surely web applications that exist where a robust CSP may be an effective mitigation, preventing practical exploitation of otherwise systemic XSS issues that are challenging to fully identify and remediate.

The recognition of the utility of CSP in mitigating risk of a major class of web application vulnerabilities has led to what I would consider a misclassification of the issue and misuse of existing standards in order to prioritize the implementation of CSP (though there may be other incentives in play as well). Consider the assignment of CVEs, which are intended to identify vulnerabilities. There exist numerous CVEs relating to a missing CSP, a CSP bypass, or some other CSP flaw. Here is one example of a missing CSP that was assigned a severity score (CVSS) of 3.1. Does this score even make sense? I do not think so.

The CSP mechanism can be effectively applied and is certainly a web application security best practice, but I would not call its absence a vulnerability and assign a CVSS. An application can be perfectly secure without a CSP. The problem, of course, is that we cannot prove that most systems are perfectly secure. Even with security-minded developers and comprehensive security testing, there are often limitations that may leave some vulnerabilities undiscovered. This fundamental assumption should inform the application security lifecycle. There are possibly unknown vulnerabilities. There exists, therefore, risk. This is unknown risk, but an experienced application security professional can often make strong inferences about these unknowns in order to recommend practices that might mitigate this risk.

So how do we classify the absence of a CSP within a system that otherwise appears secure? I would simply call it a risk. Look, we know that XSS vulnerabilities can be challenging to identify and fix in complex systems. They are often identified even after security testing. The risk is there. Therefore, we ought to try and quantify the risk posed by a missing CSP in order to appropriately contextualize and prioritize its addition within a given web application. As a security tester, I would report a missing CSP finding and assign a risk rating (Low, Medium, and so on) based on my knowledge of the application, its known flaws, and the potential existence and impact of unknown flaws leading to vulnerabilities.

I would not – by choice – assign to this finding a CVSS because it is not a vulnerability. Unfortunately, as a consultant who must meet the needs of clients, I am part of the problem. For some clients, CVSS is a requirement for all findings. Obviously, this is inappropriate use of CVSS, but I can understand how it may simplify the ingestion and triage of issues into a vulnerability management platform. The systems that support our processes are not necessarily designed to be philosophically pure. They are designed to work. That said, this simplicity may come at a significant cost.

Using CVSS as a universal vulnerability scoring metric for all managed vulnerabilities seems like an obvious choice for the simplification of vulnerability management. After all, organizations ingest vulnerability data from a large number of sources. If each source is reporting using a different scoring/rating system, then there is an additional burden on the vulnerability triage process. Unfortunately, a major issue is being overlooked: there is a significant difference between issues reported by knowledgeable security testers compared to those reported from tooling.

Consider tools that conduct SBOM analysis to identify vulnerable components/dependencies. As more projects adopt these practices, the number of reported issues will substantially increase. These tools will identify CVEs reported against known components and provide the assigned CVSS scores. If an organization is merging these findings with findings arising from expert-driven security testing (hopefully no one is doing just this), there is a major problem. The expert-identified findings are validated against the target application, but the CVSS scores are derived from the worst case scenario for each CVE, which may never be practically exploitable (most high and critical CVEs are not exploited) and may be completely inappropriately assigned. That is not to say that results from expert security testing are flawless, but there is a major difference in the level of practical consideration.

Ultimately, CVSS scores are a measure of technical severity and not a measure of risk specific to an application and organization (CVSS also does not consider uncertainty). What organizations really need to effectively manage application security risks – in my opinion – is internal expertise to appropriately triage findings and contextualize their business risk, which cannot be fully known by an external party conducting security testing, though experienced testers will likely offer relatively well-informed assessments of risk.

All of this can be challenging to communicate, which is why I wrote this post. At Digital Boundary Group (where I presently work), we have a tradition of assessing and assigning risk to our findings. We tend not to distinguish between vulnerabilities, weaknesses, and risks that do not quite fit into the previous classifications. After much thought (and internal discussions with our thoughtful app team), I think this is the most appropriate approach. Deficient security practices should be assigned a risk appropriate for the context even without a specific known weakness or vulnerability. Best practices exist for a reason and should be appropriately prioritized in the presence of uncertainties and unknowns. This, I think, is an overlooked value that experienced security testing provides.

Appendix: Additional Thoughts

Another issue to consider with CVSS is how inconsistently scores are applied, even by singular rating agencies (like NVD, who may not evaluate based on all relevant technical details). We will often look to CVEs for common issues when looking to get an idea of how the industry assesses the severity, but often there are wide ranging CVSS scores for identical issues.
The Common Weakness Scoring System (CWSS) does not appear to be commonly used, but may be a useful additional metric to enrich appropriate findings. That said, every additional metric that must be scored during an engagement is taking away valuable time that could be used hunting for additional issues. This is another reason why I favor simplicity in reporting risk ratings.
CVSS scores can be enriched with Threat and Environmental (4.0) metrics, but these still leaves gaps in some areas. This also creates an additional time burden (see point above).
Many security testing standards and guidelines (consider OWASP WSTG and PCI Penetration Testing Guidance) recommend or accept risk rankings/ratings and do not require or recommend CVSS scores reported against findings. The OWASP Top 10 list itself is focused on application risks rather than vulnerabilities or weaknesses.
See also the article Reflections on Vulnerability Risk Scoring – Severity != Risk, for further discussion on the unsuitability of CVSS for assessing risk.

Leave a ReplyCancel reply