Agentic AI Coding Tool Configurations

A dataset of configuration artifacts and AI-co-authored commits for Claude Code, Gemini CLI, Codex CLI, Copilot CLI, and Cursor across GitHub repositories.

Dashboard

Tool Adoption

Configuration Types

Visualizations

Interactive visualizations of tool adoption and configuration patterns.

Configuration Mechanisms per Tool

Usage of configuration mechanisms across agentic tools. Repositories using only AGENTS.md without tool-specific configuration are excluded. Percentages relative to repository count per tool. Column totals can exceed the overall repository count (multiple tools per repository).

Tool Usage by Programming Language

Adoption of agentic tools per programming language; AGENTS.md denotes repositories using only that file with no tool-specific configuration. Percentages relative to repository count per tool. Column totals can exceed the overall repository count (multiple tools per repository).

Tool Co-Occurrence

Co-occurrence of tools across repositories. Percentages are relative to the smaller group in each pair.

Reference Network Between Context Files

Reference-only context files. Examples of other files include README.md and CONTRIBUTING.md.

Tool Usage Combinations (Exclusive Intersections)

Each bar shows repositories that have exactly that combination of tools and no others. A repository appears in exactly one bar.

Distribution of Configuration Files per Repository

Configuration mechanism count per repository.

Creation Order Sequences

Creation order of context files per repository. Curly braces indicate that files were added on the same day.

Cumulative Adoption of Configuration Artifacts

Cumulative adoption of selected configuration artifacts for agentic tools. Copilot Instructions comprise two context file artifact types.

Repositories

Repositories in which at least one AI tool configuration artifact was detected.

AI-Authored Commits

Commits co-authored by AI tools, detected via commit trailers and metadata.

Configuration Artifacts

Browse individual configuration artifacts by type.

Downloads

Download the complete dataset files. Licensed under CC BY 4.0. The data is also available on Zenodo.

Usage Example

Load and explore the dataset with pandas. The pipeline dataset on Zenodo contains data analysis examples.

.py
import pandas as pd

# repos.csv: main dataset (40,585 rows)
repos = pd.read_csv("repos.csv")

# Three levels of data: sampling frame, classified, configured
# engineered_project is a string: "true", "false", or "unsure"
engineered_repos = repos[repos["engineered_project"] == "true"]
configured_repos = repos[repos["scanned_at"].notna()]

# Tool flags: filter repos by AI tool
claude_repos = configured_repos[configured_repos["claude"] == True]

# Config type flags: filter by configuration mechanism
repos_with_mcp = configured_repos[configured_repos["mcp"] == True]

# GitHub metadata: stars, forks, contributors, code lines, ...
popular = configured_repos[configured_repos["stargazers"] >= 1000]

# context_files.csv: context file artifacts
context_files = pd.read_csv("context_files.csv")

# Git metadata: creation dates
context_files["created_at"] = pd.to_datetime(context_files["created_at"])

# AI authorship: which files were initially created by an AI tool
ai_created = context_files[context_files["first_commit_ai_created"] == True]

# References: context files that point to other files
references = context_files[context_files["is_reference"] == True]

# Join with repo metadata
merged = context_files.merge(
    configured_repos[["repo_name", "mainLanguage", "stargazers"]],
    on="repo_name"
)

# commits.csv: AI-co-authored commits
commits = pd.read_csv("commits.csv")
commits["commit_timestamp"] = pd.to_datetime(commits["commit_timestamp"])

# Filter by AI tool (e.g., "Claude", "Copilot", "Cursor")
claude_commits = commits[commits["ai_tool"].str.contains("Claude")]

# Artifact detail files
skills = pd.read_csv("skills.csv") # skill definitions
subagents = pd.read_csv("subagents.csv") # subagent definitions
commands = pd.read_csv("commands.csv") # custom commands
rules = pd.read_csv("rules.csv") # rule files
settings = pd.read_csv("settings.csv") # settings files
hooks = pd.read_csv("hooks.csv") # hook configurations
mcp = pd.read_csv("mcp.csv") # MCP configurations

# All artifact files share: repo_name, created_at, #commits,
# github_link, is_empty, first/last_commit_sha
# Some have extra columns (e.g., skills: name, scripts, references)
skills_with_scripts = skills[skills["scripts"] == True]
agents_with_memory = subagents[subagents["memory"] == True]