Reproducible AI Practices for Bioinformatics

Kelly Sovacool, PhD

Bioinformatics Software Engineer

CCR Collaborative Bioinformatics Resource

Jun 02, 2026

AI Accelerates Bioinformatics

Security in the age of AI

What do we mean by…?

  • Security
  • Safety
  • Reproducibility

Reproducibility

Yourself -> Future you -> Your collaborators -> Peers in your field

You are your own most important collaborator

Do’s and Don’t’s

Do

  • Commit your code early and often
  • Use VS Code workspaces
  • Create a Copilot Instructions file

Don’t

  • Expose PHI or PII
  • Blindly accept AI-generated code
  • Allow AI agents to execute destructive commands

Do: commit your code

and sync it with GitHub

  • Create a GitHub repo for every project
  • Commit & push your code before you even start using AI agents
  • Commit & push every meaningful change

Do: use VS Code workspaces

Add related repos to a single VS Code workspace so Copilot will see the full context of your project

Copilot agents search your entire codebase to understand how components connect and provide answers grounded in your actual code. You can use broad prompts like “where is authentication handled?” or “add tests for the list endpoint” and get accurate answers and edits based on your codebase.

Do: create a Copilot Instructions file

VS Code automatically detects a .github/copilot-instructions.md file and applies it to all chat requests in the workspace.

Use custom instructions for:

  • Coding style and naming conventions, preferred libraries/packages
  • Architectural patterns to follow or avoid
  • Security requirements and error handling approaches
  • Documentation standards

Keep it short (< 500 lines), include examples, and explain the reasoning behind rules.

Don’t: expose PHI or PII

  • GitHub Copilot is not approved to handle PHI or PII
  • Do not track sensitive data files with git (use .gitignore file)

Don’t: blindly accept AI-generated code

NIH AI Guidance

Do not rely on the technology to be a software developer by proxy: All well-written code must adhere to security design and ethical principles. All code output needs to be reviewed for completeness, quality, efficiency, and, most of all, security. Leverage manual and automated validation tools and testing technologies to help ensure these factors. If you cannot identify or understand what a piece of AI generated code does, you should not use it.

AI outputs may be incomplete, inaccurate, biased, or fabricated.

Don’t: allow agents to execute destructive commands

  • Never allow agents to edit data files or output files. You shouldn’t edit these files yourself anyway.

  • Do not allow agents to execute destructive commands without explicit user permission.

    rm rmdir sudo chmod chown

Do’s and Don’t’s for AI Use

Do

  • Commit your code early and often
  • Use VS Code workspaces
  • Create a Copilot Instructions file

Don’t

  • Expose PHI or PII
  • Blindly accept AI-generated code
  • Allow AI agents to execute destructive commands

Reproducible practices for responsible AI use

Final thoughts

  • Generative AI tools use statistical models to produce outputs that look like a plausible response to a prompt. 🦜
    • Interpreting the meaning of AI outputs is your responsibility.
  • AI tools accelerate bioinformatics. 🚀
    • Just make sure you’re facing the direction you want to go!

Resources