Smart Extraction is GitSniper's core feature. It intelligently parses PR/MR pages to capture not just the comment text, but the full context an AI needs to understand and act on the feedback.
What Gets Extracted
For each review comment, GitSniper captures:
Comment Content
- The full comment text with formatting preserved
- Inline code snippets and code blocks
- Emoji reactions (as text representation)
- Edit history indicators
Diff Context
- The specific lines being commented on
- Surrounding context lines (additions and deletions)
- File path and line numbers
- Whether the comment is on added, removed, or unchanged code
Threading
- Parent comments and their replies
- Proper nesting to show conversation flow
- Resolution status where available
Attribution
- Author username for each comment
- Timestamp (relative, e.g., "2 days ago")
- Reviewer role indicators
Output Format
The extracted output is structured for optimal AI parsing:
EXTRACTED: PR #4821 COMMENTS
==================================================
FILE: src/utils/encoder.ts
--------------------------------------------------
DIFF CONTEXT (lines 45-52):
- function encode(input: string): string {
- return encodeURIComponent(input);
- }
+ function encode(input: string): string {
+ const lookup = buildLookupTable();
+ return fastEncode(input, lookup);
+ }
COMMENT (@shadowdev):
Consider using a pre-computed lookup table instead of
building it on each call. This could improve performance
for repeated encoding operations.
REPLY (@author):
Good point. I'll move the lookup table to module scope.
---
FILE: src/utils/encoder.ts
--------------------------------------------------
DIFF CONTEXT (lines 78-82):
+ function fastEncode(input: string, lookup: Map<string, string>): string {
+ let result = '';
+ for (const char of input) {
+ result += lookup.get(char) ?? char;
+ }
COMMENT (@shadowdev):
String concatenation in a loop creates many intermediate strings.
Consider using an array and joining at the end:
\`\`\`typescript
const parts: string[] = [];
for (const char of input) {
parts.push(lookup.get(char) ?? char);
}
return parts.join('');
\`\`\`
==================================================
DEBUG INFORMATION:
==================================================
Total comments found: 2
Comments excluded: 0
Authors detected:
- @shadowdev (Reviewer)
How Context is Determined
GitSniper uses several strategies to capture relevant context:
Line-Based Context
For inline comments, GitSniper captures:
- The exact lines the comment references
- Up to 3 lines before and after for context
- The diff markers (
+,-, or space) for each line
File Boundaries
When a comment references multiple files or spans a large diff, GitSniper:
- Groups comments by file
- Preserves the order they appear in the review
- Includes file headers for clear separation
Conversation Threading
For threaded discussions:
- Parent comments appear first
- Replies are indented with
REPLYprefix - Resolution status is noted when available
Handling Edge Cases
Large PRs
For PRs with many comments, GitSniper processes them sequentially. There's no hard limit on the number of comments extracted.
Code Suggestions
GitHub's "suggested changes" feature is captured with the suggested code clearly marked.
Outdated Comments
Comments on code that has since changed are still extracted, with the original context preserved.
Rich Formatting
Markdown formatting (bold, italic, lists) is preserved. Images and external links remain as markdown syntax.
Optimising for AI Tools
The output format is designed for AI comprehension:
- Clear section markers (
===,---) help models parse structure - Explicit labels (
FILE:,COMMENT:,REPLY:) identify content types - Preserved formatting maintains code block syntax
- Attribution lets the AI understand who said what
This structure works well with both chat-based AI (Claude, ChatGPT) and code-focused tools (GitHub Copilot, Cursor).