Governance Genome Explorer (Literature-Grounded: Vault + External Scholarship)

Fingerprint 2.0 "The Prism" (Response to Peer Review) — 2,021 AI governance statements × 226 dimensions × 6 policy families
2,021
Statements
226
Genome Fields
347
Features
6
Policy Families
0.155
Silhouette
45
Anchors

Policy Family Distribution

Top Organization Types

Regional Distribution

Statement Volume Over Time

The Governance Genome encodes 2,021 AI governance statements as 226-dimensional vectors across 3 channels (Content, Form, Anchored Embeddings), revealing 6 distinct policy families in the global AI governance landscape. This "Prism" architecture achieves silhouette 0.155, providing a structured lens into how diverse institutions frame AI principles and regulation.

The 226-Dimension Governance Genome

Each of the 2,021 AI governance statements in the Tapestry database is encoded as a 226-dimensional vector — its "governance genome." These dimensions span three channels: C1 Content (125 dimensions capturing substantive normative positions — transparency, accountability, dignity, tradition-specific concepts like ubuntu, khalifah, and imago Dei), C2 Form (27 base dimensions capturing institutional structure — who wrote it, how binding it is, for whom), and C3 Anchored (45 dimensions measuring semantic similarity to reference anchor statements). After one-hot and multi-hot expansion of categorical fields, the feature space grows to 347 total features.

Method: Features were derived via grounded theory from 12 agents × 3 waves × 120 document extracts, consolidated by an expert council into a structured rubric with 14 pillars, scored by calibrated LLM coders (Krippendorff's α = 0.942), and validated via Random Forest classification (91.6% accuracy).

Feature Catalog

Browse all 347 features in the governance genome. Use the filters and search box to narrow results. Click any row to expand its detail card showing the full description, value histogram, and per-cluster means.

Feature Architecture

The 226 genome dimensions are not a flat list — they form a structured hierarchy. Channel 1 (Content) organizes 125 dimensions into 14 pillars (e.g., Human Dignity, Transparency, Safety), each containing concepts and tradition-specific specifics. Channel 2 (Form) encodes 9 categorical enums (expanded to 72 one-hot columns), 8 multi-label tag-lists (37 multi-hot columns), and 14 continuous scores. Channel 3 (Anchored) provides 45 semantic similarity dimensions.

Channel & Pillar Sunburst

Inner ring: 3 channels sized by feature count. Middle ring: C1 pillars, C2 type groups, C3 as a single block. Outer ring: individual features (visible on hover). Color by channel: C1 Content, C2 Form, C3 Anchored.

C1 Rubric Taxonomy

The Content channel's 14 pillars, their concepts (T2), and tradition-specific specifics (T3). Node size reflects activation rate (nonzero%). Brighter nodes appear in more statements; dimmer nodes are specialized (e.g., tradition-specific concepts).

Feature Types by Channel

Each channel's composition by data type. C1 is entirely continuous scores. C2 mixes scores, one-hot encodings, and multi-hot tag expansions. C3 is all continuous.

Pillar Size Treemap

The 14 C1 pillars sized by number of features. Color intensity reflects mean variance within the pillar — brighter pillars have more variable features.

How Features Activate Across Statements

Most C1 Content features are highly sparse — tradition-specific concepts like ubuntu, khalifah, and imago Dei are deeply meaningful where they appear but absent from the vast majority of secular documents. C2 Form features are denser — most statements have a binding nature, an authority type, and a geographic scope. C3 Anchored features are nearly universal (every statement has some similarity to every anchor).

Sparsity Spectrum

All features ranked by activation rate (% of statements with nonzero value). Left end: highly sparse features active in <5% of statements. Right end: near-universal features active in >90%. Color by channel.

Variance by Channel

Distribution of standard deviations within each channel. C2 Form features show the highest variance (driven by binary one-hot columns). C1 and C3 cluster at lower variance. Wider boxes indicate more heterogeneous feature behavior.

Variance vs. Sparsity

Each dot is one feature. X-axis: activation rate. Y-axis: standard deviation. Color by channel, size by permutation importance. The upper-middle zone contains the most informative features — variable enough to discriminate, common enough to be measurable.

Channel Sparsity Profile

For each channel, how many features fall into sparse (<5% nonzero), moderate (5–50%), and dense (>50%) categories. Confirms that C1 is predominantly sparse, C2 mixed, and C3 dense.

Which Features Drive the 6 Policy Families?

Not all 347 features contribute equally to the clustering structure. C2 Form features dominate — institutional form (who wrote it, how binding, for whom) is far more predictive of governance family than substantive content (what the document says about transparency or fairness). This section explores feature importance, inter-feature correlations, and the effective dimensionality of the genome space.

Permutation Importance (Top 30)

Features ranked by how much classification accuracy drops when their values are shuffled. Color by channel. C2 Form dominance confirms institutional structure drives clustering.

Gini Importance (Top 30)

Features ranked by impurity reduction across all Random Forest decision splits. Features high on both perm and Gini lists are the most robust discriminators.

Importance by Channel

Inner ring: permutation importance share. Outer ring: Gini importance share. Both confirm C2 Form features account for ~65% of total importance.

Importance vs. Variance

X-axis: standard deviation (variance). Y-axis: permutation importance. Color by channel, size by activation rate. Reveals which high-variance features actually matter for clustering vs. which are noise.

Feature Correlation Heatmap (Top 50)

Pairwise Pearson correlations among the 50 highest-variance features, sorted by hierarchical clustering so correlated features are adjacent. Blue = negative, red = positive. Blocks along the diagonal reveal natural feature groups.

PCA Scree Plot

Bars: individual explained variance per principal component. Line: cumulative variance. Dashed lines at 80%, 90%, 95%. Shows the effective dimensionality — how many components capture most of the genome's information.

PC1 & PC2 Top Loadings

The 10 strongest positive and 10 strongest negative loadings for the first two principal components. Reveals what latent axes the genome space is organized around.

Natural Feature Clusters

Hierarchical clustering on the feature correlation matrix reveals ~10 natural groupings of features that tend to co-occur across statements. Each card below shows the cluster's size, dominant channel, top members (sorted by importance), and mean intra-cluster correlation.

Bottom line: The governance genome's 347 features have substantial redundancy — ~91 PCA components explain 80% of variance. Features cluster into ~10 natural groups, confirming that the 226 dimensions encode a smaller number of latent governance axes. The dominant axis is institutional form (C2), not substantive content (C1) — who writes governance documents matters more than what they say.

The Governance Genome Matrix

This satellite view shows the complete governance genome — every one of the 347 features scored for every one of the 2,021 statements in the Tapestry database. Each pixel represents a single score (0–100). The result is a ~700,000-cell heatmap that reveals the macro-structure of global AI governance: which features activate where, which clusters of documents share similar profiles, and where the sparse frontier of tradition-specific concepts lights up.

Color scale: Black (0 = absent) → Navy → Teal → Yellow → White (100 = maximum). Interactions: Scroll to zoom, drag to pan, hover for details. Use the ordering controls to rearrange rows and columns.

Row orderings: By Cluster — grouped by 6 policy families, most active first within each; By Year — chronological (1998→2026); By Org Type — grouped by institution type (government, religious, professional, etc.); By Region — grouped by geographic region; Seriation — optimal leaf ordering that places similar statements adjacent (smoothest gradient); By Activation — sorted by total feature activation, most active at top.  |  Column orderings: Channel + Pillar — grouped by C1→C2→C3→Crosswalk, then by rubric pillar within C1; By Importance — most predictive features first (RF permutation importance); By Variance — highest-variance features first; Seriation — optimal leaf ordering for columns.
▬ Column Tracks (top of heatmap)
Track 1 — Channel: C1 Content C2 Form C3 Anchored Crosswalk
Track 2 — Pillar: P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 P12 P13 P14 non-C1
Track 3 — Importance: Bar height ∝ RF permutation importance (white)
▮ Row Tracks (left of heatmap)
Track 1 — Cluster: Innovation Early Signals Religious Rights Professional Regulatory
Track 2 — Org Type: govt IGO religious academic civil soc. industry professional NEB
Track 3 — Region: Europe N. America Asia-Pacific Global Africa LATAM Middle East Oceania
Track 4 — Year: 1998 → 2026
Hover over the heatmap to see feature names
0
100

Reading the heatmap: Bright horizontal bands indicate statements that engage deeply with many governance dimensions. Dark horizontal bands are narrow, focused documents. Bright vertical columns are features that activate broadly across the corpus (e.g., accountability, transparency). Dark vertical columns are tradition-specific concepts (e.g., khalifah, imago Dei) that light up only for their tradition's documents.

Drill-Down Explorer

Select filters to view a readable subset of the matrix with labeled rows and columns. Maximum ~5,000 cells for interactive exploration. Click any cell to see statement details.

Select filters and click Load

Guided Pattern Analysis

Pre-computed analyses highlighting key structural patterns visible in the genome matrix. Each pattern shows a focused subset of the heatmap with narrative explanation.

UMAP Policy Landscape

Each dot represents one of 2,021 AI governance statements, projected from 226 genome dimensions down to 2D using UMAP (Uniform Manifold Approximation and Projection). Nearby dots are similar in their governance profile; distant dots differ substantially. Clusters, gaps, and gradients in the map reveal the topography of the global AI governance landscape.

Caution: UMAP preserves local neighborhood structure (nearby points are genuinely similar) but does not preserve global distances (the distance between distant clusters is not meaningful). Cluster shapes and inter-cluster gaps should be interpreted qualitatively, not metrically.

UMAP Projection — 2,021 Statements

The main projection. Use the ‘Color By’ dropdown to explore different facets: Policy Family reveals the 6 clusters; Organization Type shows institutional composition; Region reveals geographic patterns; Year Band shows temporal layering; Binding Nature highlights the enforcement landscape; Sacred-Secular illuminates the tradition spectrum.

Color By

Selection Breakdown

The 6 Policy Families

K-means clustering on the 226-dimensional genome vectors reveals 6 distinct policy families — groups of statements that share similar governance DNA. Each family card below shows the family’s size, name, and top distinguishing dimensions (features where the family mean most exceeds the corpus mean). These are not arbitrary groupings: a Random Forest classifier can predict family membership with 91.6% accuracy from the genome dimensions alone.

Pillar Intensity Heatmap (Families × Dimensions)

Rows are the 6 families; columns are the 14 governance pillars (Rights, Transparency, Safety, etc.). Cell color intensity shows the family’s mean score on that pillar. This reveals each family’s ‘governance fingerprint’ — where it invests its normative attention. Dark rows indicate families with sparse profiles; bright rows indicate families that engage deeply with many pillars.

Organization Type by Family

Stacked bars showing which institution types (government, religious, professional, etc.) populate each family. Governments split across Innovation Champions and Risk Regulators; religious organizations concentrate almost entirely in the Religious family. This confirms that institutional identity correlates with governance approach but does not determine it (ARI = 0.20).

Region by Family

Geographic distribution within each family. Innovation Champions draw heavily from Asia-Pacific and Africa; Risk Regulators are Europe-dominated; Religious governance is globally distributed across all regions with organized religious institutions. Regional patterns reflect both policy priorities and institutional capacity.

Family Deep Dives

Select a policy family below to explore its governance profile in detail. The radar chart shows the family’s mean scores on its top distinguishing dimensions. The top dimensions list ranks features by how much the family’s mean exceeds (or falls below) the corpus-wide mean. Exemplar statements are the documents closest to the family centroid — the most “typical” members of each governance approach.

Cluster Quality & Selection

Choosing the number of clusters (k) requires balancing interpretability with statistical quality. The silhouette score measures how similar each statement is to its own cluster versus the nearest neighboring cluster (+1 = perfect fit, 0 = boundary, −1 = misclassified). We selected k=6 based on interpretability, stability analysis, and expert validation — not solely on silhouette optimization.

Silhouette Score Curve (k = 2–12)

Mean silhouette score at each k value. Higher is better, but the curve typically decreases monotonically in high-dimensional data. The absence of a clear ‘elbow’ is expected — governance documents form a continuous landscape with soft boundaries, not crisp natural clusters. The k=6 solution balances granularity with interpretability.

Per-Cluster Silhouette

Mean silhouette score for each of the 6 families at the chosen k=6 solution. Higher bars indicate more cohesive families. The Religious family typically scores highest (its members are most distinctive). Lower-scoring families (e.g., Low-Specificity Instruments) contain more heterogeneous documents.

Cluster Size Distribution

Number of statements in each family. A balanced distribution suggests the clustering is not dominated by one mega-cluster. The current solution ranges from 125 (Religious) to 450 (Regulatory), reflecting genuine variation in governance activity across different institutional approaches.

Why silhouette 0.155? In high-dimensional data (226 dimensions), silhouette scores are systematically lower than in low-dimensional toy datasets. A silhouette of 0.155 is comparable to published results in computational social science and text clustering. The key validation is that a Random Forest achieves 91.6% classification accuracy on these clusters — they are genuinely separable even if boundaries are soft.

Cluster Stability Across k Values

A robust clustering solution should be stable under small perturbations. This sub-tab examines what happens when we vary k (the number of clusters) from 5 to 7. If the same families reappear across different k values, the structure is genuine. If families fragment or merge unpredictably, the solution is fragile.

Cluster Stability (k=5 to k=7)

Jaccard overlap matrix showing how clusters at k=5, k=6, and k=7 map onto each other. High overlap (bright cells) between k=5 and k=6 families means those families survive when splitting from 5 to 6 groups. The Religious family typically shows the highest stability (Jaccard > 0.9) because it is the most structurally distinct.

Silhouette Distribution by Cluster

Distribution of per-statement silhouette scores within each family. The median line shows typical cohesion; the spread shows heterogeneity. Families with tight distributions (small interquartile range) are internally coherent. Families with wide distributions or negative tails contain statements that are borderline members — they could plausibly belong to a neighboring family.

Temporal Evolution of AI Governance

AI governance has undergone dramatic shifts since the first documents appeared in our corpus (1998). Three eras are visible: 2010–2019 (innovation-dominant, few binding documents), 2020–2022 (regulatory crossover, EU AI Act process begins), and 2023–2026 (rapid expansion across all families, religious emergence). The corpus grew from 6 statements per year (2014) to over 400 per year (2024), reflecting the global urgency of AI governance.

Family Composition Over Time (Area)

Stacked area chart showing cumulative growth of each policy family over time. The vertical extent of each colored band shows that family’s share of the cumulative total. The rapid widening after 2020 reflects the global governance explosion. The appearance of the Religious family band (brown) after 2017 is clearly visible.

Annual Counts by Family (Stacked Bar)

Year-by-year statement counts, stacked by family. Peak years reveal governance surges: 2019 (post-OECD Principles), 2023 (EU AI Act adoption), 2024 (global implementation wave). The relative heights of family segments show which governance approaches dominate each year.

Genome Drift — Mean Dimension Shift by Year

Measures how much the average governance statement has changed over time across all 226 genome dimensions. Rising drift indicates the governance conversation is evolving — new concepts are entering, old ones are being re-weighted. Flat periods indicate consolidation around established frameworks.

Era Comparison Radar

Radar chart comparing the mean genome profile across three eras. Each axis represents a key governance dimension. Differences between era profiles reveal which concepts have grown (e.g., risk-based governance, environmental impact) and which have faded (e.g., purely aspirational framing).

The regulatory turn is real: In 2017–2019, innovation-optimist documents outnumbered risk-regulatory ones 2:1. By 2020–2022, risk regulation had pulled ahead. By 2023–2026, the gap widened further. The crossover year was 2020, when the EU AI Act legislative process began. Meanwhile, religious governance went from zero (pre-2017) to 125 statements — the fastest-growing family in the corpus.

The Sacred-Secular Spectrum

Each statement receives a sacred-secular score (0–100) based on the presence and intensity of religious, theological, and tradition-specific governance concepts. A score of 0 means entirely secular (e.g., EU AI Act); a score of 100 means deeply rooted in a specific religious tradition (e.g., Islamic Fiqh Academy Resolution 258). This score is derived from the C1 content dimensions in pillars P10–P12 (Religious, Islamic, Indigenous governance concepts).

Sacred-Secular Score Distribution

Histogram showing how sacred-secular scores distribute across 2,021 statements. The distribution is heavily right-skewed: the vast majority of documents are fully secular (score 0), with a smaller tail of tradition-rooted documents. The 125 statements in the Religious/Tradition-Based family (Family 3) cluster at the high end.

UMAP Colored by Sacred-Secular Score

The same 2D UMAP projection as the Policy Map, but colored by sacred-secular score. Yellow/bright dots indicate tradition-rooted documents, which cluster tightly in a distinct region of the map — confirming that religious governance occupies a structurally separate position in the global landscape, not scattered among secular documents.

Religious & Indigenous Traditions

The Governance Genome identifies 15 distinct value traditions invoked in AI governance documents, from secular human rights frameworks to specific religious and indigenous knowledge systems. This sub-tab maps which traditions are represented, how they co-occur within documents, and their relative scale in the corpus. The emergence of tradition-based governance — from zero statements before 2017 to 125 by 2026 — is one of the most distinctive findings of this analysis.

Tradition Composition Treemap

Proportional area map of all traditions invoked across the corpus. Secular and human rights frameworks dominate by volume, but religious traditions (Christian, Islamic, Buddhist, Jewish, Hindu) and indigenous wisdom collectively constitute a significant minority. The treemap reveals governance pluralism invisible in prior mapping studies.

Tradition Co-occurrence Network

Nodes are traditions; edges connect traditions that are invoked together in the same document. Edge thickness reflects co-occurrence frequency. Dense clusters reveal tradition families that travel together — e.g., Islamic concepts (Islamic + care ethics) or interfaith documents (Christian + Jewish + Islamic). Isolated nodes indicate traditions that rarely appear alongside others.

Tradition Detail Table

Full statistics for each tradition: count of documents invoking it, percentage of corpus, mean sacred-secular score, and cluster distribution. Sortable by any column.

Tradition Pairwise Correlations

This heatmap shows pairwise Phi coefficients between all tradition invocations. Phi measures the association between two binary variables (tradition invoked vs. not invoked). Values near +1 indicate traditions that co-occur strongly; values near 0 indicate independence; negative values indicate mutual exclusion. Statistical significance is Bonferroni-corrected for multiple comparisons.

Pairwise Phi Coefficients

Each cell shows the Phi coefficient between two traditions. Red/warm cells indicate positive association (traditions that appear together); blue/cool cells indicate negative association (traditions that rarely co-occur). The diagonal is always 1.0 (perfect self-correlation). Look for clusters of warm cells that reveal tradition ‘families’ — groups of traditions that function as integrated knowledge systems rather than independent principles.

Bonferroni correction: With multiple pairwise comparisons across religious and philosophical traditions, we apply Bonferroni correction to control the family-wise error rate. Only correlations significant at p < 0.05 / k (where k is the number of pairwise tests) are shown as non-grey cells. This conservative threshold reduces false-positive tradition associations while preserving genuinely strong co-occurrence patterns.

Enforcement & Legal Force

One of the most striking patterns in the data is the enforcement-rhetoric inverse: documents with the richest ethical commitments tend to be non-binding, while documents with real legal force have traded rights language for risk metrics. This tab examines how binding nature (legally binding, soft law, advisory, aspirational) distributes across the 6 policy families, and how enforcement capacity relates to governance specificity and stakeholder breadth.

Binding Nature × Family Heatmap

Cell intensity shows how many statements of each binding type fall in each family. Risk Regulators (Family 6) concentrate legally binding instruments (60.6% of all binding docs). Human Rights Advocates (Family 4) are overwhelmingly soft law or aspirational. This is the structural signature of the enforcement-rhetoric inverse.

Enforcement Gradient — Soft to Hard Law

Statements arranged along a soft-to-hard law spectrum. The gradient reveals whether families cluster at particular enforcement levels or spread across the spectrum. Innovation Champions cluster at the soft end; Risk Regulators at the hard end.

Inverse Relationship: Specificity vs Breadth

Each point represents a cluster of statements. The x-axis measures governance specificity (how concrete the prescriptions are); the y-axis measures stakeholder breadth (how many audiences are addressed). Documents that are highly specific tend to address narrow audiences (sector-specific regulation), while broad documents tend to be vague (aspirational declarations).

Implication: The global AI governance system has structurally separated its moral ambition from its enforcement capacity. Rights-rich documents are toothless; enforceable documents are rights-thin. This is not a bug in any particular document — it is an architectural pattern in the system as a whole. Bridging this gap is perhaps the most important challenge facing AI governance.

Channel Ablation — Which Channels Drive Cluster Structure?

The Governance Genome encodes each statement across 3 channels: C1 Content (155 dimensions — what the document says about transparency, fairness, safety, etc.), C2 Form (26 dimensions — who wrote it, for whom, how binding), and C3 Anchored Embeddings (45 dimensions — semantic similarity to 45 concept anchors). Ablation analysis removes one channel at a time and re-clusters to measure each channel’s contribution. Bootstrap 95% confidence intervals quantify uncertainty.

Channel Ablation with Bootstrap 95% CIs

Each bar shows the silhouette score when clustering with a specific channel combination. Error whiskers show 95% bootstrap CIs (500 iterations). The ‘All channels’ bar is the baseline. If removing a channel barely changes silhouette, that channel contributes little to cluster structure. If removing it collapses the score, it is essential.

Channel Dimension Share

Proportional breakdown of the 226 genome dimensions by channel. C1 Content accounts for the majority of dimensions (155/226 = 69%), but C2 Form dominates cluster structure despite having only 26 dimensions — a disproportionate influence that reveals the power of institutional metadata.

Feature Analysis & Sensitivity

Of the original 374 genome dimensions, 347 survived variance-based pruning (features with near-zero variance across statements were removed). This sub-tab examines which of the retained features are most discriminative for cluster assignment, how sparse different channels are, and whether the results are sensitive to feature selection thresholds and UMAP hyperparameters.

Top 30 Discriminative Features

Features ranked by discriminative power (F-statistic from one-way ANOVA across 6 clusters). Color indicates channel: orange = C1 Content, blue = C2 Form, green = C3 Anchored. C2 features dominate the top ranks despite being a minority of total dimensions.

Sparsity Distribution by Channel

Fraction of zero values per channel. C1 Content fields tend to be sparser (many documents don’t discuss religious or indigenous concepts), while C2 Form fields are denser (every document has a binding nature and governance posture). High sparsity can inflate discriminative power for niche fields.

Inter-Channel Correlation

Pairwise correlation between channel-level aggregate scores. Low correlations confirm the channels capture genuinely independent aspects of governance documents — form, content, and semantic positioning provide non-redundant information.

Feature Selection Sensitivity

How stable are the results under different variance-pruning thresholds? If the top features remain consistent across thresholds, the analysis is robust to this preprocessing choice.

UMAP Parameter Sensitivity

UMAP projections depend on hyperparameters (n_neighbors, min_dist). This table shows silhouette scores across parameter combinations, confirming that the 6-family structure is not an artifact of a specific UMAP configuration.

Concept Anchors — The C3 Embedding Channel

The C3 channel measures each statement’s semantic similarity to 45 “anchor” documents — carefully selected exemplars spanning the full governance landscape (from the EU AI Act to the Rome Call for AI Ethics to the CARE Principles for Indigenous Data). Each anchor acts as a conceptual landmark: a statement’s cosine similarity to anchor i measures how closely it aligns with that anchor’s governance philosophy. The F-statistic shows which anchors best discriminate between the 6 policy families.

Anchor Discriminative Power (F-statistic)

Each bar represents one of 45 anchors, ranked by F-statistic (one-way ANOVA across 6 families). Higher F-statistics indicate anchors whose similarity scores differ sharply across families — these are the conceptual dimensions that best separate the governance landscape. Religious and indigenous anchors tend to score highest because they activate exclusively on tradition-specific documents.

Anchor Intensity Heatmap (Family × Anchor)

Rows are the 6 policy families; columns are 45 anchors. Cell intensity shows the mean cosine similarity of that family to that anchor. Bright columns indicate anchors that activate broadly; bright rows indicate families with strong conceptual profiles. Dark cells reveal conceptual blind spots — governance dimensions that a family does not engage with.

Key finding: The anchor analysis reveals that tradition-specific governance concepts (indigenous data sovereignty, Islamic jurisprudence, Christian theological anthropology) are the sharpest discriminators between policy families. This is because these concepts have near-zero activation outside their originating tradition, creating stark contrasts. General governance concepts (transparency, accountability, fairness) discriminate poorly because they appear across all families — confirming the “superficial consensus” finding documented in prior literature.

The Innovation-Rights Trade-off

A central axis of AI governance is the tension between promoting innovation and protecting rights. Each statement is scored on gn_innovation_orientation (0=precautionary, 100=innovation-enabling) and a composite rights score (human dignity, informed consent, digital labor rights). If the global governance system were achieving synthesis, many statements would score high on both. Instead, the data reveals a strong negative correlation (Spearman ρ = −0.41, p < 10−82): innovation-first and rights-first framing are empirically opposed.

Innovation vs Rights by Policy Family

Mean innovation orientation and rights composite scores for each of the 6 policy families. Green bars = innovation mean, red bars = rights mean. Innovation Champions score 73.8 on innovation but only 2.5 on rights. The stark asymmetry confirms the trade-off is structural, not incidental.

Regional Innovation-Rights Ratio

For each region, the ratio of innovation-oriented to rights-oriented statements. Ratios above 1.0 indicate innovation dominance. Africa (5.1:1) and Asia-Pacific show the strongest innovation-first tendency; Europe leans toward rights.

Paradigm Balance Over Time

How the innovation-vs-rights balance has shifted across three eras: pre-2020 (innovation-dominant), 2020–2022 (crossover), and 2023–2026 (regulatory turn). The 2020 crossover coincides with the EU AI Act legislative process.

Statement-Level Innovation vs Rights

Each dot is one statement, positioned by innovation orientation (x-axis) and rights composite (y-axis). The empty upper-right quadrant confirms that no documents successfully integrate high innovation promotion with strong rights protections — the ‘both/and’ synthesis has not materialized.

Bottom line: The innovation-rights trade-off is the sharpest axis of disagreement in global AI governance. The empty upper-right quadrant of the scatter plot — where innovation-promoting, rights-protective documents would appear — represents the governance synthesis that policymakers aspire to but have not yet achieved. The closest approximation is the “Low-Specificity” family, which achieves balance through vagueness rather than genuine integration.

Three-Channel Architecture

The Governance Genome encodes each AI governance statement across three complementary channels, each capturing a distinct analytical lens on the document.

C1
Content
155 dims
C2
Form
26 dims
C3
Anchored
45 dims
=
226
Total Dims

Methodology

This study uses computational concept induction to identify recurring structural and substantive patterns across 2,021 AI governance statements. Each statement is scored on 181 genome dimensions (155 content + 26 form) using LLM-based structured expert simulation with calibrated 5-level rubrics. The resulting high-dimensional vectors are then subjected to variance-based feature selection, K-means clustering, and UMAP projection to reveal emergent policy families. This approach is exploratory and data-driven; the resulting clusters describe empirical co-occurrence patterns, not normative categories.

Coding Methodology

All 181 genome dimensions were scored by LLM-based structured expert simulation (Claude, Anthropic) using calibrated rubrics with 5-level ordinal anchors. Each scoring agent received the full document text and a dimension-specific rubric. 246 scoring agents operated across 3 sessions with rate-limit-safe batching. The table below documents the provenance of each methodological decision.

Pipeline Phases

Phase Name Description Status
0 Schema Design Define 3-channel architecture: C1 content (155), C2 form (26), C3 anchored (45) Complete
1 C2 Form Coding Score structural/procedural metadata for 2,021 statements (26 gn_ fields) Complete
2 C1 Content Coding Score substantive content dimensions for 2,021 statements (155 gn_ fields) Complete
3 Quality Audit Validate all 2,021 statements have exactly 181 gn_ fields (C1 + C2) Complete
4 Feature Selection Variance-based pruning, column weighting, correlation filtering Complete
4b Clustering K-means clustering (k=6), silhouette analysis, UMAP projection Complete
5 C3 Anchored Embeddings Sentence-transformer embeddings anchored to 45 concept seeds Next
6–8 Analysis & Reporting Cross-tabulations, temporal analysis, family profiling, dashboard generation Planned

Scoring Scale

Each genome dimension is scored on a 5-level ordinal scale (0–100), with calibrated anchor descriptions per dimension.

0
Absent / Not Applicable
15
Minimal / Passing Mention
40
Moderate / Partial
65
Substantial / Detailed
90
Comprehensive / Central

Reliability Metrics

Metric Value Notes
Statements scored 2,021 Quality-gated from 3,044 total
Total data points 365,801 2,021 × 181 gn_ fields
Scoring agents 246 LLM-based structured expert simulation across 3 sessions
Field completeness 100% All 2,021 = exactly 181 gn_ fields
Silhouette (k=6) 0.155 C1 + C2 channels only (Phase 4b)
Features retained 347 / 374 After variance-based pruning

Limitations

What This Study Cannot Tell You

  • Causal claims: Cluster membership reflects co-occurrence of coded features, not causal mechanisms. A statement appearing in the "Development-Oriented" family does not mean its authors prioritized innovation over rights.
  • Implementation or impact: The genome encodes what statements say, not what institutions do. A high enforcement score reflects textual specificity, not real-world compliance.
  • Representativeness: The corpus covers dozens of countries and multiple institution types, but is not a probability sample. English-language and digitally accessible documents are over-represented.
  • Ground truth for clustering: There is no external gold standard for "correct" policy families. The k=6 solution is one defensible partition; other values of k yield different but plausible groupings (see Stability subtab).
  • Tradition-specific nuance: Scoring tradition-specific governance concepts (e.g., Islamic jurisprudence, Indigenous data sovereignty) through a standardized rubric inevitably compresses culturally embedded meanings.

How This Research Could Be Misused

  • Ranking or grading jurisdictions: Policy families are descriptive clusters, not quality tiers. Using cluster labels to rank countries as "more advanced" or "less developed" in AI governance would be a misapplication.
  • Cherry-picking to justify predetermined positions: The high dimensionality of the genome means that selective emphasis on particular dimensions can support almost any narrative. Always consider the full vector.
  • Conflating coding scores with policy quality: A low score on a dimension (e.g., enforcement specificity) may reflect deliberate policy design (e.g., principles-first approach), not a deficiency.
  • Treating LLM-generated scores as human expert judgment: All scores were produced by LLM-based structured expert simulation, not by domain experts. They should be treated as systematic approximations, not authoritative assessments.

LLM-Based Structured Expert Simulation

The genome schema, scoring rubrics, and clustering parameters were developed through LLM-based structured expert simulation. This approach uses large language models prompted with domain-specific personas and calibrated rubrics to produce systematic, reproducible scores across all 2,021 statements. The methodology is fully automated and deterministic given the same model, prompts, and documents.

Schema Design LLM-simulated 8-expert panel, 43 seed concepts, 3-channel architecture
C1 Content Rubrics 155 dimension rubrics with calibrated 5-level anchors
C2 Form Rubrics 26 structural/procedural dimension rubrics
Quality Assurance Multi-pass validation, 100% field completeness verification

Citation

Tapestry (Lin, 2026). The Global AI Principles & Governance Database. Governance Genome module: Fingerprint 2.0 "The Prism" — 3-channel encoding of 2,021 AI governance statements across 226 dimensions. Available at: github.com/cjimmylin/cei-ai-statements.

Validation, Robustness & Reliability

This tab presents three complementary analyses that assess the trustworthiness of the Governance Genome clustering. The 6 policy families identified in the main analysis are only meaningful if they are (1) internally valid — genuinely separable by a classifier, (2) robust — not artifacts of a specific embedding model, and (3) reliable — reproducible across different prompt phrasings.

Method: A Random Forest classifier (200 trees, 5-fold CV) tests cluster separability. Cross-lingual robustness re-embeds all 2,021 statements with a uniform multilingual model. Inter-rater reliability uses 3 LLM coders with different prompt variants (standard, rephrased, re-anchored scale) on 200 stratified statements across all 156 scoring dimensions.

Cluster Validation via Random Forest Classification

If the 6 policy families represent genuinely distinct governance approaches, a classifier trained on the genome dimensions should be able to predict which family a statement belongs to. We train a 200-tree Random Forest on the 226-dimensional genome vectors, using 70/30 train-test split and 5-fold cross-validation. High accuracy (>90%) confirms that the clusters capture real, separable structure in the data — not noise.

Reading the KPIs: RF Accuracy is the mean 5-fold cross-validation accuracy. Test Accuracy is the held-out 30% performance. Form-Only Acc shows that C2 form features alone achieve 91.4% — institutional metadata (who wrote it, how binding it is) matters more than content. IR Spearman ρ measures the innovation-vs-rights trade-off correlation. ARI (Adjusted Rand Index) measures overlap with org_type — low values mean clusters discover novel structure beyond institutional identity. Cramér's V measures association strength.

RF Classification Accuracy (5-Fold CV)

Each bar shows the accuracy of one CV fold. The red dashed line marks the mean. Consistent bars indicate stable performance across data splits.

Per-Cluster F1 Score

F1 combines precision and recall per family. Higher bars mean the classifier reliably identifies that family. Lower F1 (e.g., Professional) suggests boundary overlap with neighboring clusters.

Confusion Matrix (6 × 6)

Rows = true family, columns = predicted family. Bright diagonal = correct predictions. Off-diagonal cells reveal which families the classifier confuses — e.g., Professional (C4) and Regulatory (C5) may share form features like standards-based governance.

Permutation Importance (Top 30)

Features ranked by how much accuracy drops when their values are shuffled. Blue = C2 Form, Orange = C1 Content, Green = C3 Anchored. C2 dominance confirms that institutional form drives cluster structure.

Gini Importance (Top 30)

Gini importance measures how much each feature reduces impurity across all decision tree splits. Unlike permutation importance, this is computed during training. Features high on both lists are the most robust discriminators.

Key finding — form over content: The top permutation-importance features are overwhelmingly C2 (form) dimensions: gn_addressee__governments, gn_binding_nature__soft_law, gn_authority_type__sovereign_state. C2 features account for 64.6% of total permutation importance. This means the 6 policy families are primarily distinguished by who writes governance documents, for whom, and with what legal force — not by the specific ethical principles they articulate. This is the empirical confirmation of the "form over content" finding reported in the main analysis.

Mediation Analysis — Model Accuracy

Each bar shows classifier accuracy using different feature subsets. Org Type Only (47.6%) shows that knowing the institution alone is insufficient. C2 Form Only (91.4%) nearly matches the full model. C1 Content Only (74.5%) adds some signal but cannot separate clusters alone. The gap between C2 and Org Type (+43.3%) proves form captures structure beyond institutional identity.

ARI vs Metadata & Effect Size

ARI (Adjusted Rand Index) measures cluster-metadata overlap. Values near 0 mean clusters discover novel structure; values near 1 mean they merely replicate the metadata. Cramér's V measures association strength. Org_type has moderate V (0.53) but low ARI (0.20), meaning clusters correlate with org type but are not reducible to it.

Cross-Lingual Embedding Robustness

The Governance Genome uses a split embedding model: all-MiniLM-L6-v2 for English texts and paraphrase-multilingual-MiniLM-L12-v2 for non-English texts. This is a methodological choice — are the resulting clusters an artifact of using two different models? To test this, we re-embed all 2,021 statements with a single uniform multilingual model and compare the resulting C3 channel scores.

A high correlation (r > 0.8) would mean the C3 channel is model-invariant. A low correlation with high cluster ARI would mean clusters survive despite C3 changes — because C2 form features dominate. The latter is what we observe.

Reading the KPIs: C3 Pearson r and Spearman ρ measure how much individual C3 scores change when switching to the uniform model (0.55 = moderate change). Cluster ARI measures whether the 6 families survive the change (0.72 = substantial preservation). EN Cosine shows that English embeddings shift dramatically (0.43) between the two models — they produce very different vector spaces. Non-EN Cosine is near 1.0 because both pipelines use the same multilingual model for non-English texts.

Per-Anchor C3 Correlation (Pearson r)

Each bar represents one of 45 concept anchors. The Pearson r shows how consistently each anchor’s cosine-similarity scores correlate between the original split-model pipeline and the uniform multilingual pipeline. Green bars (r > 0.5) indicate stable anchors; yellow (0.3–0.5) are moderately sensitive; red (< 0.3) are highly model-dependent. Anchors with culturally specific content (e.g., religious texts) tend to be more stable because they activate on distinctive vocabulary.

Language Cosine Similarity (Top 20)

For each language, this shows the mean cosine similarity between original and uniform embeddings. English (n=1,590) shows low similarity (0.43) because all-MiniLM-L6-v2 and paraphrase-multilingual-MiniLM-L12-v2 produce very different vector spaces for English. All other languages score near 1.0 because both pipelines use the same multilingual model.

Cluster Size: Original vs Uniform-C3

When we re-cluster with uniform-C3 embeddings, family sizes shift but the overall structure is preserved (ARI = 0.72). Some statements move between families, but the 6-family solution remains stable. This confirms clusters are driven primarily by C2 form features, not C3 embedding artifacts.

Bottom line: The C3 anchored-embedding channel is moderately sensitive to embedding model choice (r = 0.55), but the 6 policy families are robust (ARI = 0.72). This is a positive finding: the clusters are not an artifact of the specific sentence-transformer model. They survive because C2 form features — which are model-independent categorical variables — dominate cluster structure (RF showed C2 alone achieves 91.4% classification accuracy). The split-model approach is a real methodological choice, not a convenience, but it does not compromise the main findings.

Expanded Inter-Rater Reliability Study

The Governance Genome uses LLM-based coding: each statement is scored on 156 dimensions by a Claude model given a structured rubric. A key threat to validity is prompt sensitivity — would different prompt phrasings produce different scores? To test this, we created 3 “coders” with intentionally varied prompts and scored 200 stratified statements:

  • Coder A — Standard descriptions with 0/50/100 scale anchoring
  • Coder B — Rephrased descriptions (“How much does the document address: X?”), same scale
  • Coder C — Different scale anchoring: 0 = absent, 40 = briefly touched, 70 = substantive, 100 = primary focus

We measure agreement using Krippendorff’s α (the standard for content analysis reliability; α ≥ 0.67 is acceptable, ≥ 0.80 is good), ICC(3,k) for interval-scaled scores, Fleiss’ κ for categorical enums, and Jaccard similarity for multi-label tag-lists.

Reading the KPIs: Score α (mean/median) measures agreement across 140 continuous (0–100) score fields. ICC(3,k) is the intraclass correlation for 3 raters (consistency model) — values above 0.90 indicate excellent reliability. Enum α uses the nominal variant of Krippendorff’s α for 9 categorical fields (e.g., governance_posture, binding_nature). Tag Jaccard measures set-overlap agreement for 8 multi-label fields (e.g., rhetoric, sector_scope). Flagged counts fields below the α = 0.67 threshold.

Krippendorff's α Distribution (Score Fields)

Each dot represents one of 140 score fields, positioned by channel (x-axis) and α value (y-axis). Points are jittered horizontally to reduce overlap. The red dashed line marks the acceptability threshold (α = 0.67); the yellow dotted line marks “good” (α = 0.80). Nearly all fields cluster above 0.90, indicating excellent agreement. Orange = C1 Content, Blue = C2 Form, Green = Crosswalk.

Channel-Level α Comparison

Mean and median α for each channel. C2 Form fields (0.950) are slightly more reliable than C1 Content fields (0.943), likely because form features (binding nature, geographic scope) are less ambiguous than content features (epistemic humility, value tensions). The crosswalk channel has only 1 scored field.

Enum (α) & Tag-List (Jaccard) Reliability

Purple bars show nominal α for each enum field (e.g., governance_posture achieves 0.966 across 13 categories). Green bars show mean Jaccard similarity for tag-list fields (e.g., sector_scope achieves 0.917). Higher values mean the 3 coders selected the same categories/tags despite different prompt phrasings.

Flagged Fields (Below Threshold)

Fields with α < 0.67 require attention. These are typically culturally specific concepts with sparse activation (most statements score 0) where small differences in the few non-zero scores disproportionately affect α.

Bottom line: The Governance Genome coding scheme is highly reliable across prompt variants. Mean α = 0.942 far exceeds the 0.67 acceptability threshold and the 0.80 “good” threshold. 136 of 140 score fields achieve α ≥ 0.80. All 9 enum fields exceed 0.91. The only flagged field — gn_rahmah (Islamic mercy/compassion, α = 0.622) — is a culturally specific concept that activates on fewer than 5% of statements, making it statistically difficult to achieve high α. This does not compromise the main analysis, as the 6 policy families are driven by high-reliability C2 form features.
Why high α for indigenous/religious fields? Fields like gn_kedusha (α = 0.993), gn_tribal_sovereignty (0.993), and gn_indigenous_data_sovereignty (0.992) achieve near-perfect agreement because they are effectively binary: either a document discusses indigenous data governance or it doesn't. The coders agree perfectly on the 0-vs-nonzero distinction. The hardest fields are those requiring subjective degree judgment — gn_value_tensions (0.897), gn_collective_individual (0.884), gn_epistemic_humility (0.882) — but even these exceed the “good” threshold.