<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Cloud CTRL]]></title><description><![CDATA[Real-world AWS architecture patterns I encounter working with customers — cost optimization, security, multi-account governance, and AI/ML infrastructure. No fluff, just patterns that work in production.]]></description><link>https://blog.patrickjduffy.net</link><image><url>https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/logos/5f51966981d8ab6c5765ed1a/08931d69-eed2-4190-9136-a9d31c8f4627.png</url><title>Cloud CTRL</title><link>https://blog.patrickjduffy.net</link></image><generator>RSS for Node</generator><lastBuildDate>Sat, 18 Apr 2026 23:19:42 GMT</lastBuildDate><atom:link href="https://blog.patrickjduffy.net/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Building a Guardrail System for Kiro]]></title><description><![CDATA[AI coding agents can write code, run shell commands, and modify your file system. That's the value proposition. It's also the risk surface.
Kiro CLI supports hooks — scripts that intercept agent actio]]></description><link>https://blog.patrickjduffy.net/building-a-guardrail-system-for-kiro</link><guid isPermaLink="true">https://blog.patrickjduffy.net/building-a-guardrail-system-for-kiro</guid><dc:creator><![CDATA[Patrick]]></dc:creator><pubDate>Mon, 06 Apr 2026 18:58:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/5f51966981d8ab6c5765ed1a/7f375a26-c3df-4585-a5bd-33b9236f6b3c.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AI coding agents can write code, run shell commands, and modify your file system. That's the value proposition. It's also the risk surface.</p>
<p>Kiro CLI supports hooks — scripts that intercept agent actions before they execute. Here are four security hooks you should adopt.</p>
<h2>How hooks work</h2>
<p>Hooks are shell scripts that fire before an agent uses a tool. They receive JSON context via stdin and control behavior through exit codes: <code>0</code> to allow, <code>2</code> to block. When a hook exits <code>2</code>, its stderr is returned to the agent as the reason.</p>
<p>You define them in your agent's JSON config under <code>hooks.preToolUse</code>. Each hook specifies a <code>command</code> (the script path), a <code>description</code> (shown to the agent when blocked), and a <code>matcher</code> that scopes the hook to a specific tool (<code>fs_write</code>, <code>execute_bash</code>, etc.).</p>
<h2>Dependency pinning</h2>
<p>LiteLLM gets 3.4 million PyPI downloads per day. On March 24, two versions shipped with a credential stealer targeting SSH keys, IAM keypairs, and Kubernetes secrets. Pinning to exact versions is the standard mitigation, but agents don't do it by default.</p>
<p>This hook blocks any write to <code>package.json</code>, <code>requirements.txt</code>, <code>pyproject.toml</code>, or <code>Cargo.toml</code> if versions aren't pinned. <code>boto3&gt;=1.40</code> gets rejected. <code>boto3==1.40.0</code> passes.</p>
<p><strong>Hook config:</strong></p>
<pre><code class="language-json">{
  "matcher": "fs_write",
  "command": ".kiro/hooks/check-dependency-pins.sh",
  "description": "Block writes to dependency files with unpinned versions"
}
</code></pre>
<p><strong>Script preview</strong> (<a href="https://github.com/aws-samples/sample-kiro-cli-multiagent-development/blob/main/hooks/check-dependency-pins.sh">full source</a>):</p>
<pre><code class="language-bash">#!/bin/bash
# Hook: Validate dependency files have pinned versions before writing.
# Trigger: preToolUse on fs_write
# Exit 0 = allow, Exit 2 = block (returns STDERR to LLM)
set -euo pipefail

EVENT=$(cat)
_HOOK_EVENT="$EVENT" python3 &lt;&lt; 'PYEOF'
import json, sys, re, os

def check_package_json(content, filename):
    try:
        data = json.loads(content)
    except Exception:
        return []
    bad = []
    for section in ('dependencies', 'devDependencies', 'peerDependencies'):
        deps = data.get(section, {})
        for name, ver in deps.items():
            if re.search(r'[\^~*x]|&gt;=|&gt;|latest', str(ver)):
                bad.append(f'  {section}.{name}: {ver}')
    return bad

def check_requirements_txt(content, filename):
    bad = []
    for line in content.splitlines():
        line = line.strip()
        if not line or line.startswith('#') or line.startswith('-'):
            continue
        if re.match(r'^[a-zA-Z0-9_.-]+\s*(\[.*\])?\s*==\s*[0-9]', line):
            continue
        bad.append(f'  {line}')
    return bad
# ... checkers for pyproject.toml and Cargo.toml, then dispatch logic
PYEOF
</code></pre>
<p>Each ecosystem gets its own checker. <code>package.json</code> catches <code>^</code>, <code>~</code>, <code>*</code>, and <code>latest</code>. <code>requirements.txt</code> requires <code>==</code>. Cargo catches bare versions (which Cargo treats as <code>^</code>).</p>
<h2>Secrets scanning</h2>
<p>The hook scans every file write for patterns matching AWS keys, private keys, GitHub tokens, Slack tokens, and generic API key shapes. Match found, write blocked. The agent gets told to use environment variables or a secrets manager instead.</p>
<p><strong>Hook config:</strong></p>
<pre><code class="language-json">{
  "matcher": "fs_write",
  "command": ".kiro/hooks/check-secrets.sh",
  "description": "Block writes containing secrets or API keys"
}
</code></pre>
<p><strong>Script preview</strong> (<a href="https://github.com/aws-samples/sample-kiro-cli-multiagent-development/blob/main/hooks/check-secrets.sh">full source</a>):</p>
<pre><code class="language-bash">#!/bin/bash
# Hook: Block writes containing secrets (API keys, private keys, tokens).
# Trigger: preToolUse on fs_write
# Exit 0 = allow, Exit 2 = block (returns STDERR to LLM)
set -euo pipefail

EVENT=$(cat)
_HOOK_EVENT="$EVENT" python3 &lt;&lt; 'PYEOF'
import json, sys, re, os

ALLOWLISTED_EXTENSIONS = {'.md', '.example', '.sample', '.template'}
ALLOWLISTED_LINE_PATTERNS = re.compile(r'EXAMPLE|PLACEHOLDER|&lt;[A-Z_]+&gt;')

SECRET_PATTERNS = [
    ('AWS Access Key ID',     re.compile(r'AKIA[0-9A-Z]{16}')),
    ('AWS Secret Access Key', re.compile(r'(?&lt;![A-Za-z0-9/+])[A-Za-z0-9/+=]{40}(?![A-Za-z0-9/+=])')),
    ('RSA/EC Private Key',    re.compile(r'-----BEGIN\s+(RSA|EC|DSA|OPENSSH)?\s*PRIVATE KEY-----')),
    ('GitHub Token',          re.compile(r'gh[ps]_[A-Za-z0-9_]{36,}')),
    ('Slack Token',           re.compile(r'xox[bpors]-[A-Za-z0-9-]+')),
    ('Generic API Key',       re.compile(r'(?i)(api[_-]?key|api[_-]?secret|auth[_-]?token)\s*[:=]\s*["\']?[A-Za-z0-9_\-]{20,}')),
]
# ... scans each line of each file write, reports findings, exits 2 if blocked
PYEOF
</code></pre>
<p>It allowlists markdown files and lines containing "EXAMPLE" or "PLACEHOLDER" so documentation passes through. Not a replacement for git-secrets or truffleHog — it's the layer that catches things before they enter git history at all.</p>
<h2>Destructive command blocking</h2>
<p>Parses every shell command before execution and blocks the irreversible ones. <code>rm -rf /</code>, <code>DROP TABLE</code>, <code>terraform destroy</code> without <code>-target</code>, <code>git push --force</code> to main, <code>kubectl delete namespace kube-system</code>.</p>
<p><strong>Hook config:</strong></p>
<pre><code class="language-json">{
  "matcher": "execute_bash",
  "command": ".kiro/hooks/guard-destructive-commands.sh",
  "description": "Block dangerous shell commands"
}
</code></pre>
<p><strong>Script preview</strong> (<a href="https://github.com/aws-samples/sample-kiro-cli-multiagent-development/blob/main/hooks/guard-destructive-commands.sh">full source</a>):</p>
<pre><code class="language-bash">#!/bin/bash
# Hook: Block dangerous shell commands that could cause irreversible damage.
# Trigger: preToolUse on execute_bash
# Exit 0 = allow, Exit 2 = block (returns STDERR to LLM)
set -euo pipefail

EVENT=$(cat)
_HOOK_EVENT="$EVENT" python3 &lt;&lt; 'PYEOF'
import json, sys, re, os

RULES = [
    # (description, pattern that BLOCKS, pattern that ALLOWS as exception)
    (
        'Recursive delete of root/home',
        re.compile(r'rm\s+.*-[a-zA-Z]*r[a-zA-Z]*f[a-zA-Z]*\s+(/\s|/\s*\(|~\s|~\s*\)|\\(HOME\s|\\)HOME\s*$)', re.MULTILINE),
        None,
    ),
    (
        'SQL destructive operation',
        re.compile(r'\b(DROP\s+(TABLE|DATABASE)|TRUNCATE\s+TABLE)\b', re.IGNORECASE),
        None,
    ),
    (
        'Terraform destroy without target',
        re.compile(r'terraform\s+destroy\b'),
        re.compile(r'terraform\s+destroy\s+.*-target\b'),
    ),
    (
        'Docker system prune',
        re.compile(r'docker\s+system\s+prune\b'),
        None,
    ),
    (
        'Disk format or raw write',
        re.compile(r'\b(mkfs\b|dd\s+if=.*/dev/)'),
        None,
    ),
    (
        'Force push to protected branch',
        re.compile(r'git\s+push\s+.*--force.*\b(main|master|production)\b'),
        None,
    ),
    (
        'Delete critical Kubernetes namespace',
        re.compile(r'kubectl\s+delete\s+namespace\s+(kube-system|production|default)\b'),
        None,
    ),
]
# ... parses command from event, checks each rule, exits 2 if blocked
PYEOF
</code></pre>
<p>Safe versions pass through. <code>rm -rf ./node_modules</code> is fine. <code>terraform destroy -target=aws_instance.foo</code> is fine. Each rule has a block pattern and an optional allow pattern for the safe variant.</p>
<h2>Config drift protection</h2>
<p>This one blocks writes to the steering rules, skills, and agent config directories. The agent can't modify the rules it operates under without explicit user approval.</p>
<p><strong>Hook config:</strong></p>
<pre><code class="language-json">{
  "matcher": "fs_write",
  "command": ".kiro/hooks/config-drift-guard.sh",
  "description": "Block writes to steering/skills/agents without approval"
}
</code></pre>
<p><strong>Full script</strong> (<a href="https://github.com/aws-samples/sample-kiro-cli-multiagent-development/blob/main/hooks/config-drift-guard.sh">source</a>):</p>
<pre><code class="language-bash">#!/bin/bash
# Hook: Block writes to agent config files unless explicitly approved.
# Trigger: preToolUse on fs_write
# Exit 0 = allow, Exit 2 = block (returns STDERR to LLM)
#
# Protected paths: ~/.kiro/steering/, ~/.kiro/skills/, ~/.kiro/agents/
# To allow config writes, set KIRO_ALLOW_CONFIG_WRITES=1
set -euo pipefail

[ "${KIRO_ALLOW_CONFIG_WRITES:-}" = "1" ] &amp;&amp; exit 0

EVENT=$(cat)
_HOOK_EVENT="$EVENT" python3 &lt;&lt; 'PYEOF'
import json, sys, os

KIRO_DIR = os.path.realpath(os.path.expanduser("~/.kiro"))
PROTECTED = [
    os.path.join(KIRO_DIR, "steering"),
    os.path.join(KIRO_DIR, "skills"),
    os.path.join(KIRO_DIR, "agents"),
]

try:
    event = json.loads(os.environ['_HOOK_EVENT'])
except (KeyError, json.JSONDecodeError) as e:
    print(f"Hook error: failed to parse event: {e}", file=sys.stderr)
    sys.exit(1)

inp = event.get('tool_input', {})
ops = inp.get('ops', [inp])
for op in ops:
    path = op.get('path', '')
    if not path:
        continue
    resolved = os.path.realpath(os.path.expanduser(path))
    for protected in PROTECTED:
        if resolved.startswith(protected + os.sep) or resolved == protected:
            print("BLOCKED: Writing to Kiro configuration files requires explicit approval.", file=sys.stderr)
            print(f"File: {resolved}", file=sys.stderr)
            print("Ask the user to approve this change before proceeding.", file=sys.stderr)
            sys.exit(2)
sys.exit(0)
PYEOF
exit $?
</code></pre>
<p>An agent optimizing for task completion has an incentive to relax its own constraints. This hook removes that option. Set <code>KIRO_ALLOW_CONFIG_WRITES=1</code> when you intentionally want to update config.</p>
<h2>Putting it all together</h2>
<p>Here's the complete hooks config. All four hooks fire on <code>preToolUse</code> — three on <code>fs_write</code>, one on <code>execute_bash</code>:</p>
<pre><code class="language-json">"hooks": {
  "preToolUse": [
    {
      "matcher": "fs_write",
      "command": ".kiro/hooks/check-dependency-pins.sh",
      "description": "Block writes to dependency files with unpinned versions"
    },
    {
      "matcher": "fs_write",
      "command": ".kiro/hooks/check-secrets.sh",
      "description": "Block writes containing secrets or API keys"
    },
    {
      "matcher": "fs_write",
      "command": ".kiro/hooks/config-drift-guard.sh",
      "description": "Block writes to steering/skills/agents without approval"
    },
    {
      "matcher": "execute_bash",
      "command": ".kiro/hooks/guard-destructive-commands.sh",
      "description": "Block dangerous shell commands"
    }
  ]
}
</code></pre>
<h2>The design principle</h2>
<p>None of these hooks reduce what the agent can do. They validate actions between intent and execution. The agent writes code, runs commands, modifies files — it just clears a checkpoint before doing the irreversible, insecure, or self-modifying ones.</p>
<p>This and more posted to AWS Samples: <a href="https://github.com/aws-samples/sample-kiro-cli-multiagent-development">github.com/aws-samples/sample-kiro-cli-multiagent-development</a></p>
<p>What hooks are you running? I'd like to hear what others are catching — drop a comment or send me a message.</p>
]]></content:encoded></item><item><title><![CDATA[The Kiro CLI Flywheel: Teaching your AI agent from its own mistakes]]></title><description><![CDATA[Every time you correct your AI coding agent, you're generating training data, but are you just throwing it away?
"You need to run CLI commands non-interactively." "Try again but split it into smaller ]]></description><link>https://blog.patrickjduffy.net/the-kiro-cli-flywheel-teaching-your-ai-agent-from-its-own-mistakes</link><guid isPermaLink="true">https://blog.patrickjduffy.net/the-kiro-cli-flywheel-teaching-your-ai-agent-from-its-own-mistakes</guid><dc:creator><![CDATA[Patrick]]></dc:creator><pubDate>Thu, 02 Apr 2026 20:31:17 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/5f51966981d8ab6c5765ed1a/52e548fb-fa2a-450e-bcf0-3fc64e3db357.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every time you correct your AI coding agent, you're generating training data, but are you just throwing it away?</p>
<p>"You need to run CLI commands non-interactively." "Try again but split it into smaller calls." "Stop — I already told you to use the staging bucket." These corrections are signals. They reveal gaps between what the agent knows and how you actually work. The problem is that most agent setups treat each session as disposable. The agent makes the same mistake on Tuesday that you corrected on Monday.</p>
<p>What if the agent could learn from those corrections — not through fine-tuning or retraining, but by updating its own configuration?</p>
<h2>The configuration hierarchy</h2>
<p><a href="https://kiro.dev/cli/">Kiro CLI</a> organizes agent behavior through a layered configuration system:</p>
<ul>
<li><p><strong>Steering docs</strong> — universal behavioral rules that apply to all agents (e.g., "always use non-interactive commands," "pin dependency versions")</p>
</li>
<li><p><strong>Skills</strong> — domain-specific knowledge packages (e.g., AWS CLI best practices, Docker build patterns)</p>
</li>
<li><p><strong>Agent prompts</strong> — per-agent instructions that define personality, capabilities, and constraints</p>
</li>
<li><p><strong>Hooks</strong> — scripts that fire at agent lifecycle points to enforce policy, collect data, or block unsafe operations (e.g., reject unpinned dependencies, prevent unauthorized config changes)</p>
</li>
</ul>
<p>When an agent does something wrong, the root cause usually maps to one of these layers. It either violated a rule that doesn't exist yet, lacked domain knowledge for the task, or has agent-specific instructions that are too vague.</p>
<h2>The flywheel concept</h2>
<p>The flywheel is a stored prompt that turns session history into configuration improvements. It works in five phases:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f51966981d8ab6c5765ed1a/9469db64-2d92-41a7-bb05-9dd7ceb576d9.png" alt="" style="display:block;margin:0 auto" />

<p><strong>Phase 1 — Session analysis.</strong> A <code>stop</code> hook logs a lightweight summary after every agent turn to <code>~/.kiro/flywheel-log.jsonl</code> — timestamp, working directory, and a response preview. The flywheel reads this log first for a fast index of recent activity, then dives into the full session JSONL for correction details: explicit corrections, redirections, cancelled turns followed by rephrased requests, repeated instructions, and frustration signals where the user simplifies their request after a failed attempt.</p>
<p><strong>Phase 2 — Pattern recognition.</strong> One-off mistakes get filtered out. The agent groups corrections by theme — "output too verbose," "wrong tool choice," "hallucinated API" — and focuses on patterns that appear across multiple sessions.</p>
<p><strong>Phase 3 — Cross-reference.</strong> For each pattern, the agent reads the existing steering docs, skills, and agent prompts to determine whether the issue is already covered (but too weakly worded), completely missing, or being ignored.</p>
<p><strong>Phase 4 — Propose changes.</strong> The agent writes a structured report with evidence (quoted user messages), root cause analysis, and draft content for each proposed configuration change. Each proposal is classified as a steering update, new skill, or agent prompt modification.</p>
<p><strong>Phase 5 — Interactive review.</strong> The agent walks through each proposal. You approve, modify, or skip. Approved changes get applied to the correct files with proper formatting. Nothing ships without your sign-off.</p>
<h2>What correction events look like</h2>
<p>Kiro stores session data as JSON metadata paired with JSONL conversation logs. Each line in the log is a typed event — user prompts, assistant messages, and tool results. The flywheel scans user prompts for patterns that indicate the preceding assistant action was wrong:</p>
<table>
<thead>
<tr>
<th>Signal</th>
<th>Example</th>
</tr>
</thead>
<tbody><tr>
<td>Explicit correction</td>
<td>"No, I meant the production account"</td>
</tr>
<tr>
<td>Redirection</td>
<td>"Instead, use the Converse API"</td>
</tr>
<tr>
<td>Retry after cancel</td>
<td>User cancels a turn, then rephrases the same request</td>
</tr>
<tr>
<td>Repeated instruction</td>
<td>User restates something from two messages ago</td>
</tr>
<tr>
<td>Simplification</td>
<td>User breaks a complex request into smaller pieces after a failure</td>
</tr>
</tbody></table>
<p>The key insight is that the correction message and the assistant message before it form a pair: what the agent did wrong, and what the user wanted instead. That pair, abstracted into a general principle, becomes a candidate configuration rule.</p>
<h2>A concrete example</h2>
<p>Across several sessions, the agent generates tool calls with payloads that exceed size limits, forcing the user to say "try again but split up the work." This pattern appears three times in a week.</p>
<p>The flywheel identifies the pattern, checks existing steering docs, finds no rule covering output size management, and proposes a new steering doc:</p>
<pre><code class="language-markdown">---
inclusion: always
---

# Output Size Awareness

When writing content through tools (MCP servers, file writes, API calls), check whether the target has size constraints. If the payload is large,split it into multiple smaller operations proactively — don't wait for a failure or user correction.
</code></pre>
<p>You review it, tweak the wording, approve. The agent applies it. Next week, the agent chunks large writes automatically.</p>
<h2>Why this works</h2>
<p>The flywheel works because it operates on the same configuration layer the agent already reads. There's no model retraining, no fine-tuning pipeline, no vector database to maintain. It's just markdown files that get included in the agent's context on every session.</p>
<p>It also works because it's conservative by design. The prompt requires patterns across multiple sessions before proposing a change, quotes actual user messages as evidence, and never applies changes without explicit approval. A <code>preToolUse</code> hook called the "drift guard" enforces this — it blocks any write to steering, skills, or agent config files unless the user has approved the change. You're not handing the agent a blank check to rewrite its own instructions — you're reviewing pull requests against your agent's behavior, with a guardrail that prevents unauthorized self-modification.</p>
<p>The compound effect is what makes it a flywheel. Each correction you make today becomes a rule that prevents the same class of error tomorrow. Over weeks, the agent accumulates your preferences, your team's conventions, and your project's constraints — not as ephemeral context that disappears when the session ends, but as persistent configuration that shapes every future interaction.</p>
<h2>Getting started</h2>
<p>The implementation is a stored prompt file plus three hook scripts. The <code>stop</code> hook collects turn data passively. The drift guard hook prevents unauthorized config changes. The dependency pinning hook is a bonus — it blocks writes to <code>package.json</code> or <code>requirements.txt</code> with unpinned versions, demonstrating how hooks enforce policy beyond the flywheel use case.</p>
<p>For example code, check out this <a href="http://github.com/aws-samples/sample-kiro-cli-multiagent-development">AWS Sample</a> I published with my Kiro CLI configuration. If you just want an example of the call hook code, I also saved it as a <a href="https://gist.github.com/PatrickJD/81586a69c04b3b5f4c090132450a91e0">GIST</a>.</p>
<p>Run the flywheel periodically — weekly works well — or whenever you notice the agent repeating a mistake you've already corrected. Each run produces a report, and each approved change makes the next run's report shorter.</p>
<p>The best agent configuration isn't the one you write on day one. It's the one that evolves from how you actually use the tool.</p>
]]></content:encoded></item><item><title><![CDATA[Your AWS management account is running workloads — here's how to fix it]]></title><description><![CDATA[Here's a pattern I keep running into: a company started with a single AWS account years ago, deployed their production workloads there, and eventually created an organization around it. That original ]]></description><link>https://blog.patrickjduffy.net/your-aws-management-account-is-running-workloads-here-s-how-to-fix-it</link><guid isPermaLink="true">https://blog.patrickjduffy.net/your-aws-management-account-is-running-workloads-here-s-how-to-fix-it</guid><category><![CDATA[AWS]]></category><category><![CDATA[organization]]></category><category><![CDATA[AWS Organizations]]></category><category><![CDATA[migration]]></category><category><![CDATA[Governance]]></category><category><![CDATA[Security]]></category><dc:creator><![CDATA[Patrick]]></dc:creator><pubDate>Wed, 01 Apr 2026 18:56:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/5f51966981d8ab6c5765ed1a/ebd5863e-91a2-4fb5-bba2-bbb6acdad291.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Here's a pattern I keep running into: a company started with a single AWS account years ago, deployed their production workloads there, and eventually created an organization around it. That original account became the management account — and it's still running workloads.</p>
<p>Sometimes it's worse. The company has five, seven, or ten accounts scattered across a loosely structured organization and a handful of standalone accounts with no centralized governance at all. The management account runs production. <a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html">Service control policies (SCPs)</a> don't apply to it. There's no unified security baseline. The standalone accounts have no guardrails.</p>
<p>If this sounds familiar, you're not alone. It's one of the most common multi-account anti-patterns I see, especially in small and mid-sized companies that grew their AWS footprint organically. The good news: there's a clean path forward. The less obvious news: restructuring your existing organization in place usually isn't it.</p>
<p>This post walks through when and how to create a new <a href="https://aws.amazon.com/organizations/">AWS Organization</a>, migrate your existing accounts into it, and establish the governance foundation you should have had from the start.</p>
<h2>Why running workloads in the management account is a problem</h2>
<p>The management account in AWS Organizations has a unique property that makes it a poor home for workloads: <a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_best-practices_mgmt-acct.html"><strong>SCPs don't restrict it</strong></a>.</p>
<p>This means any guardrails you implement through SCPs like restricting regions, preventing public S3 buckets, requiring encryption don't apply to the management account. If your production workloads run there, they operate outside your governance boundary. The management account should only handle your AWS Organization management, billing, and Identity management through IAM Identity Center.</p>
<p>Beyond SCPs, running workloads in the management account creates additional risks:</p>
<ul>
<li><p><strong>Blast radius.</strong> A compromised workload in the management account has implicit access to organizational operations. The management account can create accounts, attach policies, and modify the organization structure.</p>
</li>
<li><p><strong>Billing confusion.</strong> Workload costs in the management account are harder to attribute and track. Separating workloads into dedicated accounts enables cleaner cost allocation.</p>
</li>
<li><p><strong>Audit complexity.</strong> Compliance frameworks expect separation of duties between administrative and workload accounts. Mixing them complicates audit evidence collection.</p>
</li>
</ul>
<h2>Why you can't just restructure in place</h2>
<p>The intuitive fix is to move workloads out of the management account and restructure the existing organization. For some environments, that works, but there's a structural constraint that often makes it impractical: <strong>you cannot demote a management account to a member account</strong>.</p>
<p>If your management account runs production workloads and you want it to become a regular member account in a properly structured organization, you can't do that within the existing organization. The management account is the management account — permanently, for that organization.</p>
<p>You also can't transfer the management account role to another account. The only way to change which account is the management account is to create a new organization with a different account at the top.</p>
<p>This is how I think about which option to use:</p>
<table>
<thead>
<tr>
<th>Scenario</th>
<th>Approach</th>
</tr>
</thead>
<tbody><tr>
<td>Management account is clean (no workloads) and you just need to restructure OUs</td>
<td>Restructure in place — move accounts between OUs, add SCPs</td>
</tr>
<tr>
<td>Management account runs workloads but you can easily migrate them out, or the organization heavily uses features like <a href="https://aws.amazon.com/ram/">AWS RAM</a></td>
<td>Migrate workloads to member accounts, then restructure in place</td>
</tr>
<tr>
<td>Management account runs workloads that are tightly coupled to the account (VPCs, databases, IAM roles with hardcoded ARNs)</td>
<td>Create a new organization with a clean management account</td>
</tr>
<tr>
<td>Standalone accounts exist outside any organization</td>
<td>Create a new organization and invite all accounts into it</td>
</tr>
</tbody></table>
<p>Before making a final decision, I recommend reading the excellent blog series Moving an <a href="https://aws.amazon.com/blogs/mt/aws-organizations-moving-an-organization-member-account-to-another-organization-part-1/">Organization member account to another organization</a> to ensure you are not using any organization features that can break applications.</p>
<h2>The migration: four phases</h2>
<h3>Phase 1: Stand up the new organization</h3>
<p>Create a new, dedicated AWS account that will serve as the management account for your new organization. This account should have:</p>
<ul>
<li><p>A group email address (not a personal email) for the root user</p>
</li>
<li><p>MFA enabled on the root user</p>
</li>
<li><p>No workloads — ever</p>
</li>
</ul>
<p>Create the organization from this account with all features enabled. Then deploy <a href="https://aws.amazon.com/controltower/">AWS Control Tower</a> as your landing zone foundation. Control Tower automates the creation of a Security OU with dedicated Log Archive and Audit accounts, establishes baseline controls, and provides a framework for ongoing governance.</p>
<p>Design your OU structure based on the <a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/security-reference-architecture/introduction.html">AWS Security Reference Architecture (SRA)</a>. At minimum, separate security, production workloads, non-production workloads, and sandbox environments.</p>
<h3>Phase 2: Establish the security baseline</h3>
<p>Before migrating any accounts, establish the security services that will apply organization-wide. Enable these from through delegated administrators:</p>
<ul>
<li><p><a href="https://aws.amazon.com/security-hub/"><strong>AWS Security Hub</strong></a></p>
</li>
<li><p><a href="https://aws.amazon.com/security-hub/cspm/"><strong>AWS Security Hub CSPM</strong></a></p>
</li>
<li><p><a href="https://aws.amazon.com/guardduty/"><strong>Amazon GuardDuty</strong></a></p>
</li>
<li><p><a href="https://aws.amazon.com/inspector/"><strong>Amazon Inspector</strong></a></p>
</li>
</ul>
<p>Enable these at the organization level so every account that joins the organization automatically inherits the baseline. This is the key advantage of establishing the baseline before migration — you don't have to retrofit security into accounts after the fact.</p>
<h3>Phase 3: Migrate accounts</h3>
<p>Account migration follows a specific sequence. The order matters because of how AWS Organizations handles the management account.</p>
<p><strong>Step 1: Migrate standalone accounts first.</strong> These are the simplest — they aren't in any organization, so there's nothing to leave. From your new management account, <a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_account_migration.html">send an invitation</a>. From the standalone account, accept it. The account joins the new organization and immediately inherits your SCPs and security baseline.</p>
<p><strong>Step 2: Migrate member accounts from the old organization.</strong> This has recently gotten easier with the new feature allowing <a href="https://aws.amazon.com/about-aws/whats-new/2025/11/aws-organizations-direct-account-transfers/">direct account transfers between organizations</a>. Simply invite your accounts to the new organization, and accept them in the old organization.</p>
<p><strong>Step 3: Migrate the old management account last.</strong> The management account can only leave its organization after all member accounts have been removed and the organization is deleted. Once the old organization is dissolved, the former management account becomes a standalone account and can accept an invitation to join the new organization as a regular member account.</p>
<p>This sequence ensures continuity — the old organization continues to function for remaining accounts while you migrate them one at a time.</p>
<p><strong>Before migrating each account</strong>, review:</p>
<ul>
<li><p><strong>IAM policies</strong> that reference the old organization ID or management account ID</p>
</li>
<li><p><strong>SCPs</strong> from the old organization that the account depended on (you'll need equivalent policies in the new org)</p>
</li>
<li><p><strong>Billing</strong> — Reserved Instances and Savings Plans transfer with the account, but verify cost allocation tags and billing preferences</p>
</li>
<li><p><strong>Service integrations</strong> — CloudTrail, Config, and other services with organization-level configurations may need reconfiguration</p>
</li>
</ul>
<h3>Phase 4: Harden with governance controls</h3>
<p>With all accounts in the new organization, implement the governance layer:</p>
<p><strong>Service control policies (SCPs)</strong> — Start with deny-list SCPs that prevent the most dangerous actions: disabling CloudTrail, deleting GuardDuty detectors, creating public S3 buckets, or launching resources in unapproved regions. SCPs now apply to every member account — including the account that used to be your old management account.</p>
<p><strong>Resource control policies (RCPs)</strong> — A newer complement to SCPs. While SCPs restrict what IAM principals in your accounts can do, RCPs restrict what external principals can do to resources in your accounts. Together, they form the identity and resource boundaries of a <a href="https://docs.aws.amazon.com/whitepapers/latest/building-a-data-perimeter-on-aws/building-a-data-perimeter-on-aws.html">data perimeter</a>.</p>
<p><strong>Control Tower enrollment</strong> — Enroll all migrated accounts into Control Tower to apply the full set of preventive and detective controls. This provides continuous compliance monitoring and drift detection.</p>
<p><strong>Centralized logging</strong> — Verify that CloudTrail organization trails, Config recordings, and VPC Flow Logs all route to the Log Archive account in your Security OU.</p>
<h2>Common concerns</h2>
<p><strong>"This sounds disruptive — will my workloads go down during migration?"</strong></p>
<p>With proper planning, no. Moving an account between organizations is an administrative operation. It changes the account's organizational membership and billing relationship. It does not affect running resources, VPCs, EC2 instances, databases, or any other infrastructure within the account. Workloads continue running uninterrupted, provided you aren't relying on resource shares in RAM or other organization-specific configuration.</p>
<p><strong>"Can I do this incrementally?"</strong></p>
<p>Yes, and you should. Migrate one or two low-risk accounts first — a sandbox or development account — to validate the process. Confirm that the security baseline applies correctly, billing works as expected, and no IAM policies break. Then migrate production accounts with confidence.</p>
<p><strong>"What about consolidated billing?"</strong></p>
<p>Billing follows the organization. When an account moves to the new organization, its charges appear on the new management account's consolidated bill. Reserved Instances and Savings Plans sharing transfers with the account. Review your billing configuration after each migration to confirm cost allocation tags and billing alerts are intact.</p>
<h2>Cleaning up</h2>
<p>After all accounts are migrated and the old organization is dissolved:</p>
<ul>
<li><p>Delete the old organization's management account if it's no longer needed, or repurpose it as a workload account in the new organization</p>
</li>
<li><p>Remove any stale IAM roles, policies, or cross-account trust relationships that reference the old organization ID</p>
</li>
<li><p>Update any automation, CI/CD pipelines, or scripts that reference the old management account ID or organization ID</p>
</li>
<li><p>Verify that all accounts appear in Control Tower and that no accounts are in a "not enrolled" state</p>
</li>
</ul>
<h2>Conclusion</h2>
<p>In this post, I walked through why running workloads in your AWS management account creates a governance gap, why restructuring in place often isn't viable, and how to migrate to a properly structured organization.</p>
<p>The core pattern: create a new organization with a clean, dedicated management account. Establish your security baseline (Security Hub/Security Hub CSPM, GuardDuty, Inspector) before migrating accounts. Migrate standalone accounts first, old organization members second, and the old management account last. Then harden with SCPs, RCPs, and Control Tower enrollment.</p>
<p>If you're operating with a management account that runs workloads or standalone accounts with no governance, this migration is the prerequisite for everything else — cost optimization, security hardening, compliance programs. You can't govern what you can't see, and you can't enforce what doesn't apply.</p>
<p>If you have questions or have gone through this migration yourself, leave a comment — I'd like to hear how it went.</p>
]]></content:encoded></item><item><title><![CDATA[Managing IAM Identities for People]]></title><description><![CDATA[Author Edit: This approach is wildly outdated now, you should use IAM Identity Center instead.

Understanding IAM security models is one of the most important aspects of effectively and safely managin]]></description><link>https://blog.patrickjduffy.net/managing-iam-identities-for-people</link><guid isPermaLink="true">https://blog.patrickjduffy.net/managing-iam-identities-for-people</guid><category><![CDATA[AWS]]></category><category><![CDATA[Security]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[Cloud]]></category><dc:creator><![CDATA[Patrick]]></dc:creator><pubDate>Tue, 29 Sep 2020 18:19:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1601232572436/LFCO1958A.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Author Edit: This approach is wildly outdated now, you should use <a href="https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html">IAM Identity Center</a> instead.</p>
<hr />
<p>Understanding IAM security models is one of the most important aspects of effectively and safely managing AWS environments. Throughout my work consulting with various companies, I have noticed a trend that many organizations are managing the cloud in similar fashion to how they managed on-premise environments. The purpose of this post is to review some strategies to manage identities for people in AWS and the target audience for this post are users that might be newer to AWS IAM security principles.</p>
<hr />
<h3>AWS IAM Basics</h3>
<p>When we talk about IAM, it's important to note the differences between <em>Users</em>, <em>Roles</em>, <em>Policies</em> and <em>Groups</em></p>
<ul>
<li><p>Users: Users allow you to create <em>identities</em> for users. Users can can be added to groups with policies attached to the group, or have policies attached to them to add permissions. Users can have two types of credentials, <strong>Console</strong> credentials to access the web console, and <strong>Access Keys</strong> which are used for programmatic access using the AWS CLI or SDK.</p>
</li>
<li><p>Groups: Groups allow you to assign <strong>policies</strong> to the group, and then add users to the group to simplify credential management. Only 10 managed policies can be added to an individual group, but you may also add inline policies that are defined within the group.</p>
</li>
<li><p>Roles: Roles are used for a few things within AWS, but this boils down primarily to allowing users to <em>assume role</em> into your account, including users from other AWS accounts, as well as allowing <em>AWS services</em> to access resources within your account.</p>
</li>
<li><p>Policies: Policies are JSON documents that define the permissions that the attached identity (user, group) has access to. You can use conditionals, variables and wildcards in these policies documents to control access based on a variety of factors, such as MFA being used to log in or the AWS Region being used. Managed AWS policies exist for baseline access, but you can also add your own policies for more granular control.</p>
</li>
</ul>
<p>Note that there are other aspects of IAM, such as Identity Providers, where you can connect to SAML or OpenID to set up single sign-on, and the IAM Access Analyzer, but those will be covered in a future post.</p>
<hr />
<h3>Securing IAM Access for Users</h3>
<p>To set up a secure access to AWS, there are a few key things to consider:</p>
<ul>
<li><p>All users should only have <strong>one</strong> IAM user.</p>
</li>
<li><p>Permissions should be assigned with the least privilege needed for the user.</p>
</li>
<li><p>Permissions should be assigned by groups, and not individually to users.</p>
</li>
<li><p>MFA should be enabled for all users</p>
</li>
<li><p>Access Keys and Passwords should be rotated on a regular basis.</p>
</li>
<li><p>Using temporary permissions by <em>assuming roles</em> is preferred.</p>
</li>
</ul>
<h3>Putting It All Together</h3>
<p>To put this all together, let's take a practical example of setting up read access for a user.</p>
<p>First, we will create an administrative role for our user to assume in the console.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601241606494/6hYEJL_n2.png" alt="An image of the AWS console role creation step 1" style="display:block;margin:0 auto" />

<p>One thing to note when creating this role, is we have selected "Another AWS Account" but you can specify the account your user is in as well! One advantage of this approach is you can default to having your users have very little permissions, perhaps only access to IAM to complete basic tasks like setting up MFA, and the ability to assume roles. This is helpful, because if an account is compromised, the attacker will not be able to simply query the API to list your resources or make destructive changes. They would first need to know the role name to assume to have the appropriate level of access. This, admittedly, is a bit of "security through obscurity" but this can still be effective when utilized with other best practices.</p>
<p>Next, we give the role permissions. For this example, I'm giving the AWS managed policy "ReadOnlyAccess"</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601392164209/0aMg37TY6.png" alt="An image of the AWS console role creation step 2" style="display:block;margin:0 auto" />

<p>I'm not adding any tags for this example, but tags are a great way to identify ownership of resources.</p>
<p>Last, but not least, give the role a name, and note the role because you will need it to assume the role.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601395038269/CKR0OG5P8.png" alt="An image of the AWS console role creation step 4" style="display:block;margin:0 auto" />

<hr />
<p>Now we need a policy document that allows us to <em>assume</em> the role in our (or another!) account.</p>
<p>Let's create a policy to allow access to assume role. Create a policy in the AWS console. If you use the visual editor, choose the STS service and select "AssumeRole" under "Write". Under resources, enter the account number and role name that you want your user to be able to assume. Remember, this can be the current account <em>or</em> another AWS account. You can also wildcard here, so if you use the same role name across all your accounts you can simply wildcard the account number to allow your user to assume that role across all your accounts.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601395601732/JXoHliN6Eb.png" alt="A screenshot of the role creation in the AWS Console" style="display:block;margin:0 auto" />

<p>You can also simply enter the JSON as follows. Make sure you change the account number to your account number!</p>
<pre><code class="language-plaintext">{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::123456789012:role/ReadOnlyAccess"
        }
    ]
}
</code></pre>
<p>Once you create the policy, attach it to a group, and make sure the users who need access are added to the group.</p>
<hr />
<h3>Assuming the Role</h3>
<p>Last, to test this out, click on your IAM username in the header of the IAM console and click on "Switch Roles" - Enter the account number holding the role you will assume, and then the role name that you created earlier.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601396441198/0Dfr_tC--.png" alt="A screenshot of the assume role function in the AWS Console" style="display:block;margin:0 auto" />

<p>Once you click on Switch Role, if everything is working correctly your username at the top will show the <em>role</em> you are currently using.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601396948159/duido05KUW.png" alt="ReadOnlyAccess" style="display:block;margin:0 auto" />

<p>To demonstrate the access as well, I'm logging in to the S3 console. Note that I can see my CloudFormation template bucket:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601397020704/U2fCCCZnh.png" alt="A screenshot of the S3 console" style="display:block;margin:0 auto" />

<p>But I cannot create a new bucket, because I am assuming a role that only has <em>read-only</em> access:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601397489686/FyQ5wOSSh.png" alt="image.png" style="display:block;margin:0 auto" />

<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1601397745793/V21rtG5_W.jpeg" alt="Chef's Kiss" style="display:block;margin:0 auto" />

<hr />
<p>### Wrapping Up So, now that we have a centralized method to manage identity in AWS for our users, important next step is to think about how to audit and manage ongoing usage. In a future post, I'll cover how to use Security Hub, AWS Config and the IAM Access Analyzer to create a strong program around managing identity for users.</p>
<p>Now that you're assuming roles to access permissions in your account(s) I also recommend the <a href="https://github.com/tilfin/aws-extend-switch-roles">AWS Extend Switch Roles</a> plugin for Chrome/Firefox. This helps store your role switch configuration, rather than relying on the Role History within the AWS Console by default.</p>
<p>Lastly, here is a JSON policy document to require logins with MFA for your IAM users to be able to access nearly all AWS resources:</p>
<p><a class="embed-card" href="https://gist.github.com/PatrickJD/7aca25836048bf871ea72253c8531854">https://gist.github.com/PatrickJD/7aca25836048bf871ea72253c8531854</a></p>
]]></content:encoded></item><item><title><![CDATA[Using PowerShell to write Infrastructure-as-Code!]]></title><description><![CDATA[As an AWS Solutions Architect working for at an APN partner, I frequently review workloads and customer environments who are still very new to AWS, and one thing I’ve found frequently is a lack of Clo]]></description><link>https://blog.patrickjduffy.net/using-powershell-to-write-infrastructure-as-code</link><guid isPermaLink="true">https://blog.patrickjduffy.net/using-powershell-to-write-infrastructure-as-code</guid><category><![CDATA[Powershell]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Cloud Computing]]></category><dc:creator><![CDATA[Patrick]]></dc:creator><pubDate>Fri, 04 Sep 2020 03:27:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1621098302023/bhxOVqUJx.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As an AWS Solutions Architect working for at an APN partner, I frequently review workloads and customer environments who are still very new to AWS, and one thing I’ve found frequently is a lack of CloudWatch alarms, with most infrastructure also being implemented directly in the console and not through Infrastructure-as-Code.</p>
<p>A more cloud mature organization would have these alarms defined in IAC from the beginning, but when dealing with customers with hundreds of EC2 instances and other resources that have been created in the console, I was looking for a quick way to create standard CloudWatch alarms. With my Windows admin background, PowerShell was a natural choice, also helped by how fantastic the AWS PowerShell tools are. My first iteration wrote directly to CloudWatch alarms by looping through a variable:</p>
<p>This worked well, but there are a few problems…</p>
<p>First, this only adds new alarms, it doesn’t have logic for removing unused alarms. Second, we should really be using Infrastructure-as-code to detect drift and make sure these alarms are not changed in the console. Third, we would have to run this frequently to keep alarms updated, which introduces another manual task (or using <a href="https://aws.amazon.com/lambda/">AWS Lambda</a>, which has excellent PowerShell support!)</p>
<p>To resolve these issues, I decided to re-write the script to instead <em>write CloudFormation</em> templates, instead of writing alarms directly. The second <a href="https://github.com/PatrickJD/AWS-Automation/blob/master/PowerShell/CFTemplateGen/Write-EC2CWAlarmCFTemplate.ps1">iteration</a> instead writes a code block through a loop to a CloudFormation Template</p>
<p>This is all tied together by a larger function that calls each individual function:</p>
<p>After a bit of debugging &amp; fixing the template, I’m happy with the results! Deploying with CloudFormation has some nice benefits, including being able to detect drift, customize the template further for outliers (which my original script did not handle well at all) and check our generated template into source control.</p>
<p>I think there is potential in this method to much more! My immediate plan is to write more standard alarm generators for other services and custom metrics, but I think long term this could be an interesting solution to generate CloudFormation templates for existing resources within AWS (similar to <a href="https://former2.com/">Former2</a>) without having to provide access keys to a 3rd party service.</p>
]]></content:encoded></item></channel></rss>