How I'm Translating QA Test Planning to Security Test Cases
My tech
writing led to manual QA testing and now I’m transitioning into offensive security.
When I was doing manual QA testing there was some overlap with security
testing, however, solid security testing includes threat-informed testing. This means reasoning about:
· what to attack and
· why
I
translated my QA testing background into a security threat matrix. I learned
what went right, what went wrong, and I was introduced to threat modeling
frameworks which provide a more structured approach to identifying and tackling
security risks.
Why Should Security Testing Be Its Own Thing?
QA asks
“does it work as designed?” I created test plans to ensure that features and
software follow requirements and I hunted for issues. My findings were mainly for devs to
fix bugs, errors, defects, etc. Now how do I prepare to test if someone wants
to take advantage of vulnerabilities in the software? This is the security side
of it where a malicious actor will try to exploit weaknesses in the software.
To learn how
to create a security testing matrix, I utilized AI as a sounding board to
pressure-test my thinking to ensure:
1. I stayed on track
2. It would cover information gaps
By
‘staying on track’ I mean if I am approaching or doing things wrong, AI can
correct me. The starting point it gave me to approach AppSec through a QA lens:
Goals
· Train my eye to see security issues without tools.
· Model AppSec issues through a QA lens by mapping expected behavior, abuse cases, and mitigations at the feature level.
To do
· Pick 1–2 simple features:
o Login
o Password reset
o File upload
o User profiles
Spoiler: I chose ‘Login’ and ‘Password reset’.
For each:
· Write 3-5 abuse cases, not vulnerabilities
Examples:
o “User accesses another user’s data”
o “User uploads unexpected file type”
o “User bypasses step in flow”
Questions to ask:
1. What is this feature supposed to do?
2. What does it trust?
3. What does the user control?
4. What happens if those assumptions are wrong?
5. How would I test that manually?
Basically, training me to spot security-relevant assumptions and practicing feature-level analysis.
A couple of definitions for clarity:
Feature: A single, user-visible action with a clear entry and exit point.
A feature:
· Has one primary purpose
· Can be tested in isolation
· Produces a direct response
This differs from workflow, which is a chain of features across time and state with the following qualities:
· Are multi-step
· Multiply complexity
· Explode assumptions
Why start with a feature? Because features are used so often, abuse cases are known.
Bonus!
Design smell: A sign that something in a system’s design might be wrong, fragile, or risky — even if it still “works.”
It’s not necessarily a bug or a vulnerability, but it is a warning signal.
The idea comes from “code smells” in software engineering — patterns that hint at deeper problems.
A design smell applies that same idea to:
· system behavior
· workflows
· assumptions
· UX + backend interaction
· security boundaries
A design smell could indicate:
· “This design creates unnecessary risk”
· “This may become exploitable later”
· “This makes reasoning about security harder”
Security professionals pay attention to smells because:
· vulnerabilities often grow out of them
· they’re cheaper to fix early
· they reveal architectural thinking (or lack of it)
(I thought this was interesting as this was the first time I heard this term.)
Security Test Case Table:
|
Feature |
Expected
Behavior |
Assumption |
Abuse
Case |
Security
Risk |
|
Login |
User logs in with valid creds |
User only accesses own account |
User logs in as someone else |
Account takeover |
Cool, so
now I’m on the path to learning how to think like Security QA.
Spoiler alert: Every path has bumps.
My first pass
had the core login abuse cases:
Authentication
bypass / account takeover
- Logging in as another user
- Default/admin credentials
Username
enumeration
- Error message differences
- Timing differences
- Username-first flows
Brute
force / credential stuffing (partially)
- Repeated login attempts
- Rate limiting concerns
But here was ChatGPT’s critique:
“What you’ve done here is exactly
what happens in real AppSec reviews: the table gets almost right, then
needs a final normalization pass so each row has a single, clean
purpose.”
What I did wrong:
· Rows were mixing behavior, attack, and control
· Remediation is sometimes incomplete
or slightly off
And this
is where I pushed back. Yes, I argue and discuss issues with AI. A lot.
Here were
my major beefs:
1. ‘Expected behavior’ is unclear. Does
this mean ‘expected behavior of a legitimate user’ or ‘What the system does
when expected (good or bad) behavior is carried out’, in which case ’user
action’ is more fitting.
2. ‘Assumption’ column seems to
indicate how the system is expected to act, however, that may not be the case
if cybersecurity isn’t baked in.
3. This is not how I would order the
columns.
The QA Way
My QA
background would have me create a testing matrix organized by feature, which
included columns such as:
· Test Case
· Steps
· Expected Result
· Actual Result
· Notes
The ‘Notes’ is where I document any concerns (such as a
strange error message) or other info.
Here is a traditional QA test matrix:
Matrix Revision
ChatGPT agreed with revising the Security Test Case table to reflect:
1. Action / Condition (what happens)
2. Intended System Response (what should happen)
3. What Goes Wrong (abuse/failure)
4. Impact (risk)
5. Control (remediation)
This collaborated my QA thinking with AppSec thinking.
Here’s the new Security QA Feature Threat Matrix (hybrid of a test matrix and a threat model):
|
User
/ System Action |
Intended
System Response |
Failure
/ Abuse Case |
Security
Impact |
Mitigation
/ Control |
|
Login: User submits valid
credentials |
Grant access only to owning
account |
Credentials reused by unauthorized
party |
Account takeover |
MFA, credential binding, anomaly
detection |
|
Login: User submits invalid
username |
Return generic failure message |
Username enumeration via error
responses |
Account discovery |
Generic error messages, consistent
responses |
|
Login: User submits invalid
credentials |
Response timing is uniform |
Timing differences reveal valid
users |
Account discovery |
Constant-time responses, uniform
backend logic |
|
Login: User submits repeated
invalid attempts |
Throttle attempts without lockout
abuse |
Account lockout abuse |
Denial of service |
Progressive backoff, CAPTCHA,
alerts |
|
Login: User attempts default/admin
credentials |
Reject and log attempt |
Default credentials enabled |
Privilege escalation |
Remove defaults, strong admin
authentication |
|
Password Reset: Request reset link |
Send token only to owning email |
Token intercepted or misused |
Account takeover |
Strong random tokens, TLS, token
binding |
|
Password Reset: Reset link
expiration |
Expire link after defined time |
Token remains valid indefinitely |
Account takeover |
Short expiration, single-use
tokens |
|
Password Reset: New password input |
Must differ from previous
passwords & meet complexity |
Password reuse allowed / weak
password |
Account compromise |
Enforce password history and
complexity rules |
|
Password Reset: Old credentials |
Invalidate old password after
reset |
Old password still works |
Account takeover |
Rotate credentials, invalidate
sessions |
In this case the columns are clearer.
|
Column |
What
it really means |
|
User / System Action |
What input or situation occurs
(valid, invalid, edge case) |
|
Intended System Response |
How the system should
respond securely |
|
Failure / Abuse Case |
How that response can fail or be
abused |
|
Security Impact |
Why that failure matters |
|
Mitigation / Control |
What prevents or reduces the
impact |
Now here’s the plot twist: my writing group suggested I use Claude AI for research as they consider it superior to ChatGPT, so I decided to give it a shot.
And this was its critique: “What you're doing sounds more like test
planning / security test cases than threat modeling. True threat
modeling (STRIDE, PASTA, etc.) starts before testing by asking
"what could go wrong architecturally?" — then your test plans flow
from that.”
And that’s where my previous blog post came from. I had not heard of threat modeling frameworks in the past so I did some quick research. And now, I’m changing my approach to focus more on planning security testing through a framework. But it seems like it’s not a big deal.
Claude AI: “Your current approach is essentially the last step — you're just missing the threat identification step before it. Easy fix.”
I’ll be tackling that in my next post. I had to force myself away from system thinking to focus on the more atomic, concise feature. And in that I had to explore the ways a feature could be exploited and why. This helped me bridge the gap from ‘this feature follows requirements’ to ‘if someone wanted to break this, how would they do it and for what purpose?’

Comments
Post a Comment