Smart Extraction | GitSniper Docs

Smart Extraction is GitSniper's core feature. It intelligently parses PR/MR pages to capture not just the comment text, but the full context an AI needs to understand and act on the feedback.

What Gets Extracted

For each review comment, GitSniper captures:

Comment Content

The full comment text with formatting preserved
Inline code snippets and code blocks
Emoji reactions (as text representation)
Edit history indicators

Diff Context

The specific lines being commented on
Surrounding context lines (additions and deletions)
File path and line numbers
Whether the comment is on added, removed, or unchanged code

Threading

Parent comments and their replies
Proper nesting to show conversation flow
Resolution status where available

Attribution

Author username for each comment
Timestamp (relative, e.g., "2 days ago")
Reviewer role indicators

Output Format

The extracted output is structured for optimal AI parsing:

EXTRACTED: PR #4821 COMMENTS

==================================================

FILE: src/utils/encoder.ts
--------------------------------------------------

DIFF CONTEXT (lines 45-52):
-  function encode(input: string): string {
-    return encodeURIComponent(input);
-  }
+  function encode(input: string): string {
+    const lookup = buildLookupTable();
+    return fastEncode(input, lookup);
+  }

COMMENT (@shadowdev):
Consider using a pre-computed lookup table instead of
building it on each call. This could improve performance
for repeated encoding operations.

  REPLY (@author):
  Good point. I'll move the lookup table to module scope.

---

FILE: src/utils/encoder.ts
--------------------------------------------------

DIFF CONTEXT (lines 78-82):
+  function fastEncode(input: string, lookup: Map<string, string>): string {
+    let result = '';
+    for (const char of input) {
+      result += lookup.get(char) ?? char;
+    }

COMMENT (@shadowdev):
String concatenation in a loop creates many intermediate strings.
Consider using an array and joining at the end:

\`\`\`typescript
const parts: string[] = [];
for (const char of input) {
  parts.push(lookup.get(char) ?? char);
}
return parts.join('');
\`\`\`

==================================================
DEBUG INFORMATION:
==================================================
Total comments found: 2
Comments excluded: 0

Authors detected:
  - @shadowdev (Reviewer)

How Context is Determined

GitSniper uses several strategies to capture relevant context:

Line-Based Context

For inline comments, GitSniper captures:

The exact lines the comment references
Up to 3 lines before and after for context
The diff markers (+, -, or space) for each line

File Boundaries

When a comment references multiple files or spans a large diff, GitSniper:

Groups comments by file
Preserves the order they appear in the review
Includes file headers for clear separation

Conversation Threading

For threaded discussions:

Parent comments appear first
Replies are indented with REPLY prefix
Resolution status is noted when available

Handling Edge Cases

Large PRs

For PRs with many comments, GitSniper processes them sequentially. There's no hard limit on the number of comments extracted.

Code Suggestions

GitHub's "suggested changes" feature is captured with the suggested code clearly marked.

Outdated Comments

Comments on code that has since changed are still extracted, with the original context preserved.

Rich Formatting

Markdown formatting (bold, italic, lists) is preserved. Images and external links remain as markdown syntax.

Optimising for AI Tools

The output format is designed for AI comprehension:

Clear section markers (===, ---) help models parse structure
Explicit labels (FILE:, COMMENT:, REPLY:) identify content types
Preserved formatting maintains code block syntax
Attribution lets the AI understand who said what

This structure works well with both chat-based AI (Claude, ChatGPT) and code-focused tools (GitHub Copilot, Cursor).