How I'm Translating QA Test Planning to Security Test Cases

 


My tech writing led to manual QA testing and now I’m transitioning into offensive security. When I was doing manual QA testing there was some overlap with security testing, however, solid security testing includes threat-informed testing. This means reasoning about:

·       what to attack and

·       why

I translated my QA testing background into a security threat matrix. I learned what went right, what went wrong, and I was introduced to threat modeling frameworks which provide a more structured approach to identifying and tackling security risks.


Why Should Security Testing Be Its Own Thing?

QA asks “does it work as designed?” I created test plans to ensure that features and software follow requirements and I hunted for issues. My findings were mainly for devs to fix bugs, errors, defects, etc. Now how do I prepare to test if someone wants to take advantage of vulnerabilities in the software? This is the security side of it where a malicious actor will try to exploit weaknesses in the software.

To learn how to create a security testing matrix, I utilized AI as a sounding board to pressure-test my thinking to ensure:

1.     I stayed on track

2.     It would cover information gaps

By ‘staying on track’ I mean if I am approaching or doing things wrong, AI can correct me. The starting point it gave me to approach AppSec through a QA lens:

Goals

·       Train my eye to see security issues without tools.

·       Model AppSec issues through a QA lens by mapping expected behavior, abuse cases, and mitigations at the feature level.

To do

·        Pick 1–2 simple features:

o   Login

o   Password reset

o   File upload

o   User profiles

Spoiler: I chose ‘Login’ and ‘Password reset’.

For each:

·        Write 3-5 abuse cases, not vulnerabilities

Examples:

o   “User accesses another user’s data”

o   “User uploads unexpected file type”

o   “User bypasses step in flow”

 

Questions to ask:

1.     What is this feature supposed to do?

2.     What does it trust?

3.     What does the user control?

4.     What happens if those assumptions are wrong?

5.     How would I test that manually?

 

Basically, training me to spot security-relevant assumptions and practicing feature-level analysis.

 

A couple of definitions for clarity:

Feature: A single, user-visible action with a clear entry and exit point.

A feature:

·        Has one primary purpose

·        Can be tested in isolation

·        Produces a direct response

This differs from workflow, which is a chain of features across time and state with the following qualities:

·  Are multi-step

·  Multiply complexity

·  Explode assumptions

 

Why start with a feature? Because features are used so often, abuse cases are known.

 

Bonus!

Design smell: A sign that something in a system’s design might be wrong, fragile, or risky — even if it still “works.”

It’s not necessarily a bug or a vulnerability, but it is a warning signal.

The idea comes from “code smells” in software engineering — patterns that hint at deeper problems.

A design smell applies that same idea to:

·        system behavior

·        workflows

·        assumptions

·        UX + backend interaction

·        security boundaries

A design smell could indicate:

·        “This design creates unnecessary risk”

·        “This may become exploitable later”

·        “This makes reasoning about security harder”

Security professionals pay attention to smells because:

·        vulnerabilities often grow out of them

·        they’re cheaper to fix early

·        they reveal architectural thinking (or lack of it)

(I thought this was interesting as this was the first time I heard this term.)

 

Security Test Case Table:

This is what ChatGPT gave me to get me started.

Feature

Expected Behavior

Assumption

Abuse Case

Security Risk

Login

User logs in with valid creds

User only accesses own account

User logs in as someone else

Account takeover


Cool, so now I’m on the path to learning how to think like Security QA.

Spoiler alert: Every path has bumps.


My first pass had the core login abuse cases:

Authentication bypass / account takeover

  • Logging in as another user
  • Default/admin credentials

Username enumeration

  • Error message differences
  • Timing differences
  • Username-first flows

Brute force / credential stuffing (partially)

  • Repeated login attempts
  • Rate limiting concerns


But here was ChatGPT’s critique:

“What you’ve done here is exactly what happens in real AppSec reviews: the table gets almost right, then needs a final normalization pass so each row has a single, clean purpose.”

 

What I did wrong:

·       Rows were mixing behavior, attack, and control

·       Remediation is sometimes incomplete or slightly off

 

And this is where I pushed back. Yes, I argue and discuss issues with AI. A lot.

Here were my major beefs:

1.     ‘Expected behavior’ is unclear. Does this mean ‘expected behavior of a legitimate user’ or ‘What the system does when expected (good or bad) behavior is carried out’, in which case ’user action’ is more fitting.

2.     ‘Assumption’ column seems to indicate how the system is expected to act, however, that may not be the case if cybersecurity isn’t baked in.

3.     This is not how I would order the columns.


The QA Way

My QA background would have me create a testing matrix organized by feature, which included columns such as:

·       Test Case

·       Steps

·       Expected Result

·       Actual Result

·       Notes

The ‘Notes’ is where I document any concerns (such as a strange error message) or other info.

Here is a traditional QA test matrix:


Matrix Revision

ChatGPT agreed with revising the Security Test Case table to reflect:

1.     Action / Condition (what happens)

2.     Intended System Response (what should happen)

3.     What Goes Wrong (abuse/failure)

4.     Impact (risk)

5.     Control (remediation)

This collaborated my QA thinking with AppSec thinking.

Here’s the new Security QA Feature Threat Matrix (hybrid of a test matrix and a threat model):

User / System Action

Intended System Response

Failure / Abuse Case

Security Impact

Mitigation / Control

Login: User submits valid credentials

Grant access only to owning account

Credentials reused by unauthorized party

Account takeover

MFA, credential binding, anomaly detection

Login: User submits invalid username

Return generic failure message

Username enumeration via error responses

Account discovery

Generic error messages, consistent responses

Login: User submits invalid credentials

Response timing is uniform

Timing differences reveal valid users

Account discovery

Constant-time responses, uniform backend logic

Login: User submits repeated invalid attempts

Throttle attempts without lockout abuse

Account lockout abuse

Denial of service

Progressive backoff, CAPTCHA, alerts

Login: User attempts default/admin credentials

Reject and log attempt

Default credentials enabled

Privilege escalation

Remove defaults, strong admin authentication

Password Reset: Request reset link

Send token only to owning email

Token intercepted or misused

Account takeover

Strong random tokens, TLS, token binding

Password Reset: Reset link expiration

Expire link after defined time

Token remains valid indefinitely

Account takeover

Short expiration, single-use tokens

Password Reset: New password input

Must differ from previous passwords & meet complexity

Password reuse allowed / weak password

Account compromise

Enforce password history and complexity rules

Password Reset: Old credentials

Invalidate old password after reset

Old password still works

Account takeover

Rotate credentials, invalidate sessions


In this case the columns are clearer.

Column

What it really means

User / System Action

What input or situation occurs (valid, invalid, edge case)

Intended System Response

How the system should respond securely

Failure / Abuse Case

How that response can fail or be abused

Security Impact

Why that failure matters

Mitigation / Control

What prevents or reduces the impact


Now here’s the plot twist: my writing group suggested I use Claude AI for research as they consider it superior to ChatGPT, so I decided to give it a shot.


And this was its critique: “What you're doing sounds more like test planning / security test cases than threat modeling. True threat modeling (STRIDE, PASTA, etc.) starts before testing by asking "what could go wrong architecturally?" — then your test plans flow from that.”

And that’s where my previous blog post came from. I had not heard of threat modeling frameworks in the past so I did some quick research. And now, I’m changing my approach to focus more on planning security testing through a framework. But it seems like it’s not a big deal.

Claude AI: “Your current approach is essentially the last step — you're just missing the threat identification step before it. Easy fix.”

I’ll be tackling that in my next post. I had to force myself away from system thinking to focus on the more atomic, concise feature. And in that I had to explore the ways a feature could be exploited and why. This helped me bridge the gap from ‘this feature follows requirements’ to ‘if someone wanted to break this, how would they do it and for what purpose?’

Comments

Popular posts from this blog

Resources, Tips, and Techniques that Helped Me Pass the CompTIA Security+ Exam

Protecting Our Elders: A Comprehensive Look at Social Engineering Threats and Proactive Steps for Families

Network+ Deep Dive: Where Firewalls, Load Balancers, and APs Fit in the OSI Model