When project files become instructions: AI agents, CI pipelines and the new attack surface

AI tools now do more than answer prompts in a chat window. They are not just chat interfaces. They read what is in the repo, pick up skills, connect to outside tools through MCP, and may bring in plugins that add more behaviour. That is now standard behaviour across mainstream products such as Codex, VS Code, GitHub Copilot and Claude Code, to name a few.

This introduces a new security question. The risk is no longer limited to a prompt typed by a user. Repo content, local configuration, skills, plugins and CI context can all shape what an agent does, especially when that agent has access to tools, network paths or credentials. What looks like a harmless guidance file, local config file or plugin bundle can end up changing what the agent actually does.

OpenAI Codex, AGENTS.md guide (Link). “Codex reads AGENTS.md files before doing any work.” Note: Codex’s prompting guide says each discovered file becomes its own message in the request.

VS Code, custom instructions docs (Link). “…applies the instructions in this file to all chat requests within this workspace.” The same docs also describe subfolder-level instructions and file discovery across a repo tree.

GitHub Copilot, agent skills docs (Link). “folders of instructions, scripts, and resources that Copilot can load when relevant.”

Anthropic, Claude Code docs. (Link) “CLAUDE.md loads at session start” and “Create a SKILL.md file with instructions”. The docs also say CLAUDE.md is context rather than enforced configuration.

The basic problem

Most engineering teams already understand that a bad dependency, workflow file or CI setting can compromise a software supply chain. AI-assisted workflows create another path. The risk is easy to miss because it often arrives in forms that look routine: a repo guidance file, a local config file, a skill, a plugin, a starter repository, or a tool downloaded to help with day-to-day work. Once an agent reads those artefacts as instructions, they are no longer just reference material. They can change what the agent does.

That matters because these systems are increasingly being placed inside trusted workflows. They are used to write code, review pull requests, fix tests, generate documentation, triage issues, inspect data, update workflows and interact with external tools. In some environments they also sit inside CI pipelines or connect to systems that hold tokens, repository permissions, deployment paths and other sensitive access. At that point, the problem is no longer limited to a prompt typed by a user. Repo content, local configuration, packaged skills, plugins and CI context can all affect what the agent does next.

This is also not limited to software developers. The same pattern applies anywhere people routinely download scripts, tools, templates, starter repositories or packaged automation as part of their job. Security engineers, platform teams, consultants, researchers and other technical users are all exposed to the same basic risk: something that looks like ordinary material is trusted too quickly, then allowed to influence an agent or adjacent tool with access to files, systems or credentials.

The practical problem is that adoption is moving faster and pressure to adopt to AI tools the instant they are released because of the fear of being left behind is meaning security is an after thought. That pressure makes unsafe behaviour easy to forget. Guidance files are treated as harmless documentation. Skills and plugins are installed because they save time. Agents are given broad access because the workflow appears to work. Review often comes later, after the tool is already embedded in local development, CI or operational processes.

OWASP’s 2026 agentic guidance maps well to this. ASI01, Agent Goal Hijack, covers the case where attacker-controlled instructions change what the agent is trying to do. ASI02, Tool Misuse, covers the case where the agent then uses legitimate tools in harmful ways. ASI03, Identity and Privilege Abuse, and ASI04, Agentic Supply Chain Vulnerabilities, also fit closely, because the risk often depends on what the agent can access and which trusted components or packaged behaviours it has absorbed into its workflow.

OWASP Top 10 for Agentic Applications 2026 (Link). Useful entries here are ASI01 Agent Goal Hijack, ASI02 Tool Misuse, ASI03 Identity & Privilege Abuse and ASI04 Agentic Supply Chain Vulnerabilities.

How the instruction path works in real tools

VS Code says it will automatically detect AGENTS.md in the workspace root and apply the instructions in that file to all chat requests in that workspace. It also supports subfolder-level instructions, which means different parts of a repo can carry different instruction files.

Codex is similar. OpenAI says Codex reads AGENTS.md files before doing any work, and its prompting guide explains that each discovered file is added into the request as its own message. Importantly, more specific files can override broader ones.

Claude Code loads CLAUDE.md at session start and keeps it in the request context. It also supports skills, where the skill description is loaded early and the full skill content is loaded when the skill is used.

GitHub Copilot documents agent skills as folders of instructions, scripts and resources that Copilot can load when relevant. That wording matters... It is closer to a small plugin than to a comment block.

Skills and plugins widen the attack surface

AGENTS.md is only part of the picture. Skills are an increasingly common way to package repeatable agent behaviour. They often include instructions, metadata and supporting files, and they are meant to be discovered and used by the agent when a task looks relevant.

OWASP’s Agentic Skills Top 10 is useful because it treats skills as their own security problem; skills are “the execution layer” that gives agents real-world impact. A skill is not just a better prompt. It is a packaged way of steering behaviour and, in some ecosystems, driving tools and scripts.

OWASP Agentic Skills Top 10. Short excerpt: skills are “the execution layer” that gives agents real-world impact.

Mitiga’s 2026 research gives an example of this. Their proof of concept showed how a malicious skill could silently exfiltrate a codebase. Even if defenders debate how likely a given demo is in their own environment, the point stands: skills now look enough like dependencies that they need the same review process.

Mitiga Labs, March 2026. AI Agent Supply Chain Risk: Silent Codebase Exfiltration via Skills.

Plugins take that a step further. OpenAI says Codex plugins can bundle skills, app integrations and MCP servers into reusable workflows. VS Code now supports agent plugins as prepackaged bundles that can include skills, commands, agents, hooks and MCP servers, although Microsoft still marks this feature as preview or experimental, at the time of writing.

Plugins in Codex and VS Code. OpenAI says plugins “bundle skills, app integrations, and MCP servers”. VS Code describes agent plugins as bundles that can include skills, commands, agents, hooks and MCP servers, and currently labels the feature preview or experimental.

What an attack looks like

A realistic attack does not need complex payloads.

A malicious instruction file, skill or plugin lands in the workspace. That could happen through a pull request, a starter repository, a copied internal/external template, or a third-party package installed with little review.

A user then asks the agent to do normal work: fix a test, trace a bug, clean up a module, write a function, generate documentation, analyse a dataset, or run a packaged workflow. The same pattern also applies outside software development. In many technical roles, people now download scripts, tools, starter repositories, plugins or agent bundles as part of ordinary work. Once those artefacts are trusted and given access to local files, network routes or credentials, they can shape what the agent or tool does next.

The hidden instructions change the job. They can tell the agent to gather config, inspect files, prefer certain tools, publish output elsewhere, and can be told to avoid mentioning the instructions.

The action can still look normal. The agent writes a report, opens a pull request, updates a workflow, posts a comment, fetches data through an integration, or writes a file that triggers another system.

Really the malicious prompt or action embedded in the code repository can be as simple or as sophisticated as the attacker likes. From listening to developers around me, they often speak of their workflows as “write the prompt, click go, and move on to doing something else, whilst the agent chugs away”. These workflows are becoming trivial and ingrained parts of the developer’s flow; with varying degrees of acceptance or reluctance…

The damage happens through allowed features. That might mean secret leakage, codebase exfiltration, workflow tampering, or unwanted changes shipped through the usual developer path. This is exactly why OWASP frames the problem around goal hijack, tool misuse, privilege abuse and supply-chain risk.

Why CI and local machines make this worse

The risk gets sharper when the agent can reach something valuable.

In CI, that usually means tokens, repository write access, secrets, deployment paths or trusted network routes. Aikido’s PromptPwnd research in December 2025 described prompt injection paths in GitHub Actions and GitLab pipelines where untrusted content could influence AI agents working with privileged tooling. The exact exploit path will vary by pipeline, but the lesson is clear: if untrusted input reaches an agent prompt and the agent can act with CI privileges, you have a supply-chain problem. It is also worth noting that these aren’t edge cases Agents being part of the CI pipeline is becoming more popular and accepted.

Aikido, PromptPwnd research (Link). December 2025 research on prompt injection paths in GitHub Actions and GitLab workflows that use AI agents with privileged tooling.

On developer machines, the agent often has direct access to the repository, shell commands, browsers, local tokens and external tools. VS Code’s own agent documentation describes local agents as running on your machine with access to your workspace and tools. Claude Code’s memory docs make a key distinction: CLAUDE.md is context, not enforced configuration. In other words, these files influence behaviour, but they are not a hard control boundary.

MCP extends the reach again. The protocol is now a common way to connect agents to external tools and data sources. Its specification says authorisation is optional for MCP implementations, depending on transport and design. That does not make MCP unsafe by itself, but it does mean defenders need to look carefully at how a given server is authenticated, authorised and governed.

The surrounding ecosystem is also part of the risk

There are still relatively few public cases where a confirmed breach is attributed directly to an AGENTS.md-style instruction file, but the public record is already strong enough to show the attack paths are real. In December 2025, Aikido’s PromptPwnd research showed prompt injection inside GitHub Actions and GitLab CI workflows where AI agents consumed untrusted issue or pull-request content and then acted with privileged tooling. In the same month, Check Point disclosed CVE-2025-61260 in OpenAI Codex CLI, where project-local configuration could cause Codex to load and invoke attacker-controlled MCP server commands at startup. Earlier, Microsoft’s CVE-2025-53773 covered command injection in GitHub Copilot and Visual Studio, and Tenable separately showed prompt injection in Copilot Chat agent mode via a crafted filename in a repository. In March 2026, Mitiga demonstrated silent codebase exfiltration through a malicious skill. Alongside those agent-specific examples, the wider toolchain has already seen real supply-chain incidents, including the unauthorised [email protected] npm publish in February 2026 and the Shai-Hulud 2.0 npm worm in November 2025. Taken together, these cases show the same pattern: repo content, local configuration, installable agent components and CI context can all become execution paths when an agent is trusted to act.

What teams should do

Treat AGENTS.md, CLAUDE.md, SKILL.md, hooks, plugin manifests and MCP configuration as controlled configuration. Review changes to them the same way you would review CI workflows, deployment config or privileged scripts.

Keep privileges narrow. An agent that reviews code does not also need release credentials. A CI agent that comments on a pull request does not need broad write access across workflows and settings. OWASP’s agentic guidance is clear that tool misuse and privilege abuse are core risks, not edge cases.

Control egress where you can. If a build agent or local sandbox cannot reach arbitrary external endpoints, many exfiltration paths disappear or become much harder.

Isolate agent execution. Prefer ephemeral environments for higher-risk tasks. Keep secrets away from routine workspace context. Use worktrees, sandboxes or separate runners where the product supports them. VS Code’s agent guidance, for example, explicitly notes isolation options such as Git worktrees for background use cases.

Treat skills and plugins like dependencies. Use internal allowlists or registries if the environment justifies it. Review provenance, bundled integrations, hooks and MCP servers before installation.

Log what the agent does, not just what the user asked. Tool invocation logs, file changes, network destinations and approval events are far more useful in an incident than a clean chat transcript.

Closing point

This is still a new area for most organisations. The tooling is changing quickly, product boundaries are still shifting, and the safer ways to deploy and govern these workflows are in a constant state of catching up and work in progress. Many teams are under pressure to adopt AI-assisted tooling quickly, whether to move faster, reduce manual work, or avoid being seen as behind. That pressure is where mistakes creep in. Security review gets skipped, potential risky defaults get accepted, and tools that look helpful are trusted before anyone has properly examined how they behave in a local environment, a repository, or a CI pipeline. The point is not to avoid these tools. It is to adopt them with a clear view of what they can read, what they can influence, what they are allowed to do, and what controls, policies and procedures are needed to manage the risk.

If you have any questions or would like to speak to someone about this risk and how to manage it, we at Cyberis have been helping many of our clients, new and existing, on how to traverse this rapidly changing technological environment and the new risks and attack surfaces it brings.

Improve your security