Agentic AI Coding Tool Configurations Dataset

Visualizations

Interactive visualizations of tool adoption and configuration patterns.

Configuration Mechanisms per Tool

Usage of configuration mechanisms across agentic tools. Repositories using only AGENTS.md without tool-specific configuration are excluded. Percentages relative to repository count per tool. Column totals can exceed the overall repository count (multiple tools per repository).

Tool Usage by Programming Language

Adoption of agentic tools per programming language; AGENTS.md denotes repositories using only that file with no tool-specific configuration. Percentages relative to repository count per tool. Column totals can exceed the overall repository count (multiple tools per repository).

Tool Co-Occurrence

Co-occurrence of tools across repositories. Percentages are relative to the smaller group in each pair.

Reference Network Between Context Files

Reference-only context files. Examples of other files include README.md and CONTRIBUTING.md.

Tool Usage Combinations (Exclusive Intersections)

Each bar shows repositories that have exactly that combination of tools and no others. A repository appears in exactly one bar.

Distribution of Configuration Files per Repository

Configuration mechanism count per repository.

Creation Order Sequences

Creation order of context files per repository. Curly braces indicate that files were added on the same day.

Cumulative Adoption of Configuration Artifacts

Cumulative adoption of selected configuration artifacts for agentic tools. Copilot Instructions comprise two context file artifact types.

Usage Example

Load and explore the dataset with pandas. The pipeline dataset on Zenodo contains data analysis examples.

.py

import pandas as pd

# repos.csv: main dataset (40,585 rows)
repos = pd.read_csv("repos.csv")

# Three levels of data: sampling frame, classified, configured
# engineered_project is a string: "true", "false", or "unsure"
engineered_repos = repos[repos["engineered_project"] == "true"]
configured_repos = repos[repos["scanned_at"].notna()]

# Tool flags: filter repos by AI tool
claude_repos = configured_repos[configured_repos["claude"] == True]

# Config type flags: filter by configuration mechanism
repos_with_mcp = configured_repos[configured_repos["mcp"] == True]

# GitHub metadata: stars, forks, contributors, code lines, ...
popular = configured_repos[configured_repos["stargazers"] >= 1000]

# context_files.csv: context file artifacts
context_files = pd.read_csv("context_files.csv")

# Git metadata: creation dates
context_files["created_at"] = pd.to_datetime(context_files["created_at"])

# AI authorship: which files were initially created by an AI tool
ai_created = context_files[context_files["first_commit_ai_created"] == True]

# References: context files that point to other files
references = context_files[context_files["is_reference"] == True]

# Join with repo metadata
merged = context_files.merge(
    configured_repos[["repo_name", "mainLanguage", "stargazers"]],
    on="repo_name"
)

# commits.csv: AI-co-authored commits
commits = pd.read_csv("commits.csv")
commits["commit_timestamp"] = pd.to_datetime(commits["commit_timestamp"])

# Filter by AI tool (e.g., "Claude", "Copilot", "Cursor")
claude_commits = commits[commits["ai_tool"].str.contains("Claude")]

# Artifact detail files
skills = pd.read_csv("skills.csv") # skill definitions
subagents = pd.read_csv("subagents.csv") # subagent definitions
commands = pd.read_csv("commands.csv") # custom commands
rules = pd.read_csv("rules.csv") # rule files
settings = pd.read_csv("settings.csv") # settings files
hooks = pd.read_csv("hooks.csv") # hook configurations
mcp = pd.read_csv("mcp.csv") # MCP configurations

# All artifact files share: repo_name, created_at, #commits,
# github_link, is_empty, first/last_commit_sha
# Some have extra columns (e.g., skills: name, scripts, references)
skills_with_scripts = skills[skills["scripts"] == True]
agents_with_memory = subagents[subagents["memory"] == True]

Agentic AI Coding Tool Configurations

Dashboard

Tool Adoption

Configuration Types