Smart Extraction

How GitSniper captures code review comments with full diff context.

Smart Extraction is GitSniper's core feature. It intelligently parses PR/MR pages to capture not just the comment text, but the full context an AI needs to understand and act on the feedback.

What Gets Extracted

For each review comment, GitSniper captures:

Comment Content

  • The full comment text with formatting preserved
  • Inline code snippets and code blocks
  • Emoji reactions (as text representation)
  • Edit history indicators

Diff Context

  • The specific lines being commented on
  • Surrounding context lines (additions and deletions)
  • File path and line numbers
  • Whether the comment is on added, removed, or unchanged code

Threading

  • Parent comments and their replies
  • Proper nesting to show conversation flow
  • Resolution status where available

Attribution

  • Author username for each comment
  • Timestamp (relative, e.g., "2 days ago")
  • Reviewer role indicators

Output Format

The extracted output is structured for optimal AI parsing:

EXTRACTED: PR #4821 COMMENTS

==================================================

FILE: src/utils/encoder.ts
--------------------------------------------------

DIFF CONTEXT (lines 45-52):
-  function encode(input: string): string {
-    return encodeURIComponent(input);
-  }
+  function encode(input: string): string {
+    const lookup = buildLookupTable();
+    return fastEncode(input, lookup);
+  }

COMMENT (@shadowdev):
Consider using a pre-computed lookup table instead of
building it on each call. This could improve performance
for repeated encoding operations.

  REPLY (@author):
  Good point. I'll move the lookup table to module scope.

---

FILE: src/utils/encoder.ts
--------------------------------------------------

DIFF CONTEXT (lines 78-82):
+  function fastEncode(input: string, lookup: Map<string, string>): string {
+    let result = '';
+    for (const char of input) {
+      result += lookup.get(char) ?? char;
+    }

COMMENT (@shadowdev):
String concatenation in a loop creates many intermediate strings.
Consider using an array and joining at the end:

\`\`\`typescript
const parts: string[] = [];
for (const char of input) {
  parts.push(lookup.get(char) ?? char);
}
return parts.join('');
\`\`\`

==================================================
DEBUG INFORMATION:
==================================================
Total comments found: 2
Comments excluded: 0

Authors detected:
  - @shadowdev (Reviewer)

How Context is Determined

GitSniper uses several strategies to capture relevant context:

Line-Based Context

For inline comments, GitSniper captures:

  • The exact lines the comment references
  • Up to 3 lines before and after for context
  • The diff markers (+, -, or space) for each line

File Boundaries

When a comment references multiple files or spans a large diff, GitSniper:

  • Groups comments by file
  • Preserves the order they appear in the review
  • Includes file headers for clear separation

Conversation Threading

For threaded discussions:

  • Parent comments appear first
  • Replies are indented with REPLY prefix
  • Resolution status is noted when available

Handling Edge Cases

Large PRs

For PRs with many comments, GitSniper processes them sequentially. There's no hard limit on the number of comments extracted.

Code Suggestions

GitHub's "suggested changes" feature is captured with the suggested code clearly marked.

Outdated Comments

Comments on code that has since changed are still extracted, with the original context preserved.

Rich Formatting

Markdown formatting (bold, italic, lists) is preserved. Images and external links remain as markdown syntax.

Optimising for AI Tools

The output format is designed for AI comprehension:

  1. Clear section markers (===, ---) help models parse structure
  2. Explicit labels (FILE:, COMMENT:, REPLY:) identify content types
  3. Preserved formatting maintains code block syntax
  4. Attribution lets the AI understand who said what

This structure works well with both chat-based AI (Claude, ChatGPT) and code-focused tools (GitHub Copilot, Cursor).