AI Slop Annotation Explorer

Dashboard

Code Frequency

Coverage Heatmap

Documents × codes. Cell value = number of posts with that code in that document.

Documents

Browse annotated discussions by document. Click a document to expand its thread. Click code pills to view code details in the codebook.

By Code

Browse annotated posts grouped by code. Click a card to see all posts that received that code. Click “View in codebook” for the full code definition.

Co-Occurrence

This matrix shows how often two codes appear on the same post. Hover cells for details.

Co-Occurrence Matrix

Top Co-Occurring Pairs

Coding Procedure

This section describes the coding procedure for the paper “An Endless Stream of AI Slop”: The Growing Burden of AI-Assisted Software Development by Sebastian Baltes, Marc Cheong, and Christoph Treude. The analysis followed an iterative coding approach inspired by (Corbin & Strauss, 2015). It covers how the codebook was developed, how annotations were produced, and how code assignments were reviewed across four rounds.

1. Open Coding

The second author read through a selection of five documents (R01, R02, R03, R04, R05) and performed open coding for themes, generating an initial set of 40 codes.

2. Axial Coding

All three authors conferred and performed axial coding on this initial list. Similar codes were merged, and codes that did not address the research question were discarded or subsumed into broader codes. This produced a jointly agreed-upon set of 8 codes.

3. Iterative Refinement

The first author iteratively revised and extended the codebook through eleven revisions, assisted by Claude Code (Opus 4.6, high-effort). Following emerging practices for human–AI collaboration in qualitative analysis (Dunivin, 2025), the AI tool assisted with codebook refinement while a human author retained decision authority over all structural changes.

This refinement involved: splitting overly broad codes into more specific ones; adding decision questions for each code to guide consistent application; developing boundary notes to disambiguate easily-confused codes; curating additional examples; and documenting dual-coding guidance. The archived PDF files were converted to JSON files with nested comment trees for this step; content was verified to be unaltered. Examples included in the final codebook were cross-checked against the live online versions of the posts.

4. Cross-Checking

We developed an interactive visualization with a network graph using Louvain community detection (Blondel et al., 2008) on the code relationship graph. The second and third authors used this visualization alongside the codebook to review and validate the final code set.

5. Corpus Annotation

Using the finalized codebook, the first author annotated the full corpus with Claude Code (Opus 4.6, high effort) in two passes. In the first pass, Claude Code filtered each document to retain only sub-discussions relevant to AI slop in software development and produced a relevance justification for each retained post. The first author reviewed the filtering results before proceeding. In the second pass, Claude Code coded the filtered documents: each post received applicable codes and a coding justification explaining the assignment. The first author then spot-checked the assigned codes and justifications for plausibility. The annotation prompt, designed by the first author, instructed the model to evaluate each post against the codebook’s decision questions, apply boundary notes and code relationships for disambiguation, and assign multiple codes rather than forcing single-code assignments. The prompt is available as part of the supplementary material.

6. Annotation Review

The author team reviewed all annotations in four rounds. Each round refined both code assignments and the codebook itself. Rounds 2 through 4 followed a consistent process: the second author reviewed a batch of documents in depth, shared feedback with the first author, and all three authors discussed the discrepancies and resolved them. The team then revised the codebook accordingly, and the first author used Claude Code to apply the revised codebook across all 15 documents.

Round 1. The first author spot-checked annotations across the corpus using the visualization tool. This mainly triggered changes to the way justifications were formulated, as a preparation for the manual inspection that followed.

Round 2. The second author reviewed six documents in depth (R02, R05, R06, H01, H02, H03) and proposed 57 changes. The team accepted 50, challenged 2, and rejected 5 after discussion. Most accepted changes added secondary codes that the original annotation had missed; a smaller number removed codes where the AI slop reference was too incidental to warrant coding. The team widened the ai-content-detection definition to include observations about the difficulty or impossibility of detection (not only active detection strategies) and formalized thread-context guidance, that is, evaluating each comment relative to the original post and surrounding discussion, not in isolation. The first author then used Claude Code to apply these changes across all 15 documents, which yielded 9 additional revisions across 5 documents (R07, R08, R10, R13, H02).

Round 3. The second author reviewed five documents (R07, R08, R10, R11, R12) and proposed 102 changes. The team accepted 94 after discussion. The most consequential codebook change tightened the out_of_scope rule: comments on tangential topics (general workplace norms, unrelated anecdotes, reactions, and meta-commentary to news items) are not coded merely because they appear in an AI-slop-related thread. Other updates added boundary notes to structural-drivers, slop-mitigations, and mandated-ai-adoption to resolve recurring ambiguities the review had exposed. The first author then applied these changes across all 15 documents with Claude Code, which yielded 21 additional revisions across 9 documents.

Round 4. The second author reviewed the final remaining document (R13) and proposed 60 changes; the team accepted 55 after discussion. Two codebook refinements followed: a boundary note on sarcastic-skepticism clarified when it should complement topical codes versus stand alone, and the team extended reviewer-burden to cover maintainers interacting directly with AI agents, not only reviewing AI-generated code submitted by humans. The team also added a brief-comment guideline to the annotation prompt, clarifying that short comments should be evaluated in thread context: bare agreements or emotional reactions are left uncoded, but short comments that introduce a new characterization or rhetorical move (sarcasm) are coded. The first author applied these changes across all 15 documents with Claude Code, which yielded 4 additional revisions in R02, R06, and R08.

Summary. Across all four rounds, the team made 234 post-level revisions (some posts were revised in more than one round). No primary code assignment was systematically wrong; human review primarily caught missed dual- and multiple-coding opportunities and removed codes from bare reactions or tangential comments rather than correcting outright errors.