💡 Welcome to the Methodological Appendix!

This file provides a fully transparent and reproducible account of the data analysis performed for the paper, ‘Breaking in or Breaking Through? How Local Specialisations Shape the Integration of AI Technologies’ (doi: https://doi.org/10.1080/10438599.2025.2558626). Its goal is to offer a clear roadmap of the methodological steps, from initial data sourcing to the final regression analysis.

This document details the complete analytical pipeline, structured into five key parts:

1. Calculating Technological Specializations (Section 1.1) ⚙️
This section describes how raw patent data is processed to calculate Revealed Technological Advantage (RTA) for different countries and for AI as a whole. This is performed across three distinct time intervals to capture technological evolution.
2. Constructing and Visualizing Technological Spaces (Sections 1.2 & 1.3) 🌐
Here, we explain the creation of the Global Technological Space (GTS), based on the co-occurrence of technological fields in patents, and the dynamic AI-specific Technological Space (ATS). This section also covers the visualization of national trajectories within these spaces.
3. Generating Supporting Figures (Section 2) 📊
This part outlines the code used to create the descriptive figures presented in the paper, such as the growth of AI patents and the share of specialized fields.
4. Robustness Checks & Permutation Analysis (Section 3) 🎲
This section details the permutation-based robustness checks, which are used to ensure that our findings are statistically significant and not the result of random chance.
5. Econometric Analysis (Section 4) 📈
Finally, we present the regression models used to formally test our hypotheses, including the data setup, model specifications, and interpretation of the results.

1. Technological Spaces based on Technological field

1.1. Calculate Specializations for Different Time intervals

This section details the foundational step of our analysis: calculating the technological specializations of countries and of AI itself. We begin by loading extensive patent datasets and defining several custom functions to streamline the process. The end goal is to compute Revealed Technological Advantage (RTA) scores for three distinct time intervals, which will later serve as the basis for constructing our technological spaces.

Data Loading and Initial Setup

We start by loading the necessary R libraries and defining a set of custom functions that will be used repeatedly for data aggregation and weighting. The primary dataset is a large file containing patent applications and their associated inventor locations, which is loaded in manageable chunks to optimize memory usage.

A preview of the primary patent data (ipc_all_patents_part2_df) is shown below. Each row represents a patent application (appln_id) linked to an inventor’s country (ctry_code) and a technological field (techn_field_nr).

kable(as.data.frame(ipc_all_patents_part2_df[1:6,]))

appln_id	ctry_code	techn_field_nr	weight	priority_year
203438	JP	2	9	2000
203438	JP	2	9	2000
203438	JP	9	1	2000
203438	JP	9	1	2000
203521	US	15	375	1996
203521	US	16	625	1996

Note that the original weight column from PATSTAT is disregarded in our analysis. We recalculate a fractional weight internally to ensure that each patent application has a total weight of 1, distributed equally among its assigned technological fields.

Next, we load two supplementary datasets:

AI Patent Data (other_files/IPCs_AI.csv): A curated list of patent applications identified as being related to Artificial Intelligence.
IPC Technology Names (other_files/ipc_technology.csv): A reference file that maps technological field numbers to their descriptive names and sectors.

The AI Patent Data (ai_patents_df) looks like this:

head(ai_patents_df)

The IPC Technology Names (ipc_names_df) looks like this:

kable(as.data.frame(ipc_names_df[1:6,]))

field_nr	sector	field_name	techn_field_nr
1	Electrical engineering	Electrical machinery, apparatus, energy	1
2	Electrical engineering	Audio-visual technology	2
3	Electrical engineering	Telecommunications	3
4	Electrical engineering	Digital communication	4
5	Electrical engineering	Basic communication processes	5
6	Electrical engineering	Computer technology	6

Calculating Specializations for Interval 1 (1974-1988)

The core of this section involves calculating the specialization scores for our first time interval, 1974-1988. This process is repeated identically for the subsequent two intervals.

The first step is to filter the main patent dataset for the specified period. We then apply our custom functions, group_by_applnID() and group_by_ctry_and_techn_field(), to fractionally count patent activities.

The group_by_applnID() function assigns an equal weight to each technological field within a single patent. For instance, if a patent is classified under four fields, each field receives a weight of 0.25. The result is a weighted dataset:

kable(as.data.frame(region_tech_fields_1_df[1:6,]))

appln_id	ctry_code	techn_field_nr	field_weight
206163	DE	1	0.50
206163	DE	1	0.50
214019	FR	9	0.25
214019	FR	9	0.25
214019	FR	29	0.25
214019	FR	29	0.25

Next, group_by_ctry_and_techn_field() aggregates these weights, summing them up for each country-technology pair. This yields the total fractional count of patents for each country in each technological field.

kable(as.data.frame(region_tech_fields_1_df[1:6,]))

ctry_code	techn_field_nr	n_tech_reg
AD	20	1
AD	24	1
AD	28	3
AD	32	1
AD	33	1
AD	34	3

This aggregated data, saved as reg_tech_FirstPeriod.csv, is transformed into a country-technology matrix. The matrix rows represent countries, columns represent technological fields, and the values are the fractional patent counts. It looks like this:

kable(as.matrix(mat_reg_tech1[1:20, 1:12]), caption = "Sample of the Country-technology matrix")

Sample of the Country-technology matrix
	1	2	3	4	5	6	7	8	9	10	11	12
AG	1	0	0	0	0	0	0	0	0	0	0	0
AM	1	0	0	0	0	0	0	0	0	0	0	0
AR	11	7	6	1	2	2	0	0	3	12	1	6
AT	758	299	198	23	131	64	1	28	279	461	38	130
AU	1403	606	409	48	178	272	10	81	509	1321	149	659
BA	0	0	0	0	0	0	0	0	0	0	0	0
BB	0	0	0	0	0	0	0	0	0	1	0	0
BE	261	97	159	32	54	43	0	26	182	196	65	80
BG	992	335	166	36	333	455	0	114	228	1284	137	340
BI	1	0	0	0	0	0	0	0	0	0	0	0
BM	2	2	0	0	0	0	0	0	0	0	0	0
BO	2	0	0	0	0	0	0	0	0	1	0	1
BR	1759	647	754	47	138	325	12	50	296	962	48	851
BS	4	0	0	0	0	0	0	0	0	0	0	0
BU	0	0	0	0	0	0	0	0	0	1	0	0
CA	3827	1409	1474	254	553	784	14	372	1368	2915	252	1041
CH	2194	788	372	108	299	242	4	225	841	2526	120	754
CL	0	1	1	0	0	1	0	0	1	1	0	2
CN	931	233	146	20	106	317	0	132	306	994	45	223
CO	8	2	0	0	0	0	0	0	0	0	0	4

Finally, we use this matrix to calculate the Revealed Technological Advantage (RTA) for each country in each field. RTA is a non-binary index that measures whether a country has a greater share of patents in a specific technology compared to the global average. An RTA value greater than or equal to 1 indicates a specialization.

kable(as.data.frame(reg_RCA1_df[1:6,]))

ctry_code	techn_field_nr	RCA
AG	1	6.4097284
AM	1	12.8194569
AR	1	0.4209374
AT	1	0.7479332
AU	1	0.4791203
BA	1	0.0000000

This entire process is then repeated, this time using only the AI-related patents from the first interval to calculate AI-specific RTAs for each country. For the AI patents, the RTAs look like this:

kable(as.data.frame(reg_RCA1_AI_df[1:12,]))

ctry_code	techn_field_nr	RCA
AT	1	0
AT	10	0
AT	11	0
AT	12	0
AT	13	0
AT	17	0
AT	2	0
AT	20	0
AT	23	0
AT	24	0
AT	25	0
AT	26	0

The general and AI-specific RTA dataframes are then merged. The resulting file for the first interval shows, for each country and technological field, both its general specialization (RCA_Gen) and its AI-specific specialization (RCA_AI), as highlighted below for the whole dataset, and for Japan as an example.

#Resulting file:
kable(as.data.frame(rca_data_period_1_df[1:6,]))

ctry_code	techn_field_nr	RCA_AI	Period
AD	1	NA	1974-1988
AD	10	NA	1974-1988
AD	11	NA	1974-1988
AD	12	NA	1974-1988
AD	13	NA	1974-1988
AD	14	NA	1974-1988

#Example Japan:
kable(as.data.frame(rca_data_period_1_df[rca_data_period_1_df$ctry_code == "JP",][1:6,]))

	ctry_code	techn_field_nr	RCA_Gen	RCA_AI	Period
2731	JP	1	1.1268685	1.4025974	1974-1988
2732	JP	10	1.0109760	1.2022263	1974-1988
2733	JP	11	0.7067914	0.0000000	1974-1988
2734	JP	12	1.0644547	1.0957792	1974-1988
2735	JP	13	0.6104061	0.7012987	1974-1988
2736	JP	14	0.6177233	NA	1974-1988

A key methodological step follows: we treat the entire corpus of AI patents as if it belonged to a single, hypothetical ‘country’ named AI_pat. This novel approach allows us to calculate the RTA for AI itself across all technological fields, providing a benchmark against which national specializations can be compared. The resulting data is saved for later use, and it looks like this:

kable(as.data.frame(region_tech_ai_1_df[region_tech_ai_1_df$ctry_code == "AI_pat",][1:6,]))

ctry_code	techn_field_nr	n_tech_reg
AI_pat	1	1.750000
AI_pat	2	2.250000
AI_pat	3	3.533333
AI_pat	4	2.750000
AI_pat	5	12.916667
AI_pat	6	279.433333

Consolidating Data Across All Intervals

The calculation process detailed above is repeated for the remaining two intervals: 1989-2003 and 2004-2018. After processing all periods, the three interval-specific RTA files are combined into a single, comprehensive dataset named IPC_RCAs that is saved for later usage (Files_created_with_the_code/data/files_code_Fields_analysis/IPC_RCAs.csv). This file contains the complete history of general and AI-specific specializations for all countries across the three time periods. Using Japan again as an example, the file looks like this for this country across each interval:

kable(as.data.frame(IPC_RCAs[IPC_RCAs$ctry_code == "JP" & IPC_RCAs$Period == "1974-1988",][1:6,]))

	ctry_code	techn_field_nr	RCA_Gen	RCA_AI	Period
2731	JP	1	1.1268685	1.4025974	1974-1988
2732	JP	10	1.0109760	1.2022263	1974-1988
2733	JP	11	0.7067914	0.0000000	1974-1988
2734	JP	12	1.0644547	1.0957792	1974-1988
2735	JP	13	0.6104061	0.7012987	1974-1988
2736	JP	14	0.6177233	NA	1974-1988

kable(as.data.frame(IPC_RCAs[IPC_RCAs$ctry_code == "JP" & IPC_RCAs$Period == "1989-2003",][1:6,]))

	ctry_code	techn_field_nr	RCA_Gen	RCA_AI	Period
9486	JP	1	1.1395992	0.9834465	1989-2003
9487	JP	10	1.0147327	0.8904603	1989-2003
9488	JP	11	0.6029495	0.6173858	1989-2003
9489	JP	12	1.0394907	0.9396071	1989-2003
9490	JP	13	0.5833007	0.5805270	1989-2003
9491	JP	14	0.6666774	1.8521575	1989-2003

kable(as.data.frame(IPC_RCAs[IPC_RCAs$ctry_code == "JP" & IPC_RCAs$Period == "2004-2018",][1:6,]))

	ctry_code	techn_field_nr	RCA_Gen	RCA_AI	Period
18341	JP	1	1.2953810	1.3244132	2004-2018
18342	JP	10	0.8728225	0.9737969	2004-2018
18343	JP	11	0.5286471	0.9540950	2004-2018
18344	JP	12	0.8957888	1.1166269	2004-2018
18345	JP	13	0.8071723	1.3124353	2004-2018
18346	JP	14	0.5785004	3.3393324	2004-2018

To facilitate further analysis and visualization, we create consolidated summary files for each interval. These files combine the RTA scores of the four focus countries (US, CN, KR, JP) and the AI_pat entity into a single, wide-format table. They are named, e.g., as Files_created_with_the_code/data/files_code_Fields_analysis/Metrics_First_period.csv (for the First interval, and the names change accordingly for the Second and Third intervals). The data for this example interval looks like this:

kable(as.data.frame(First_period[1:6,]))

techn_field_nr	sector	field_name	RCA_US	RCA_CN	RCA_KR	RCA_JP	RCA_AI
1	Electrical engineering	Electrical machinery, apparatus, energy	0.8061303	0.7812853	0.8859104	1.126869	0.0593493
2	Electrical engineering	Audio-visual technology	0.5290351	0.2812052	1.5817366	1.341886	0.0853538
3	Electrical engineering	Telecommunications	0.6577273	0.3468912	1.4004752	1.230989	0.3360644
4	Electrical engineering	Digital communication	0.7356689	0.2069789	1.3043447	1.244439	1.0978493
5	Electrical engineering	Basic communication processes	0.8740121	0.3793871	1.0785703	1.191508	1.6452825
6	Electrical engineering	Computer technology	0.6008344	0.5349367	0.9466936	1.371771	16.6484958

Finally, these three interval-specific summary files are merged into a master file named All_periods, shown below. This file includes additional labels for analytical purposes, though these are not central to the paper’s main findings.

head(IPC_names)

In the last step of this sub-section, we use the IPC_RCAs.csv file to generate a summary table (IPC_RCAs_Top4). Here, the non-binary RTA values are binarized, where any RTA ≥ 1 is considered a specialization (value of 1) and any RTA < 1 is not (value of 0). We then sum these binary indicators to count the number of general specializations, AI-specific specializations, and coinciding specializations (where a country is specialized in both the general field and its AI-specific application) for each country and interval, resulting in the following dataset:

kable(as.data.frame(IPC_RCAs_Top4[1:6,]))

ctry_code	techn_field_nr	RCA_Gen	Period	Label	Round_general	Total_RCA
CN	1	0.7812853	1974-1988	Electrical machinery, apparatus, energy	0	0
CN	10	1.2306427	1974-1988	Measurement	1	1
CN	11	0.9483087	1974-1988	Analysis of biological materials	0	0
CN	12	0.7424070	1974-1988	Control	0	0
CN	13	1.3542371	1974-1988	Medical technology	1	1
CN	14	0.9327427	1974-1988	Organic fine chemistry	0	0

1.2. Building the Global Technological Space (GTS)

The next step is to construct the backbone of our analysis: the Global Technological Space (GTS). This space is a network where nodes represent technological fields, and the links between them signify their relatedness. We measure this relatedness based on the principle that technologies that frequently appear together within the same patent are likely to be related.

1.2.1. From Patents to a Co-occurrence Matrix

To quantify this relationship, we must first count how often every possible pair of technologies co-occurs across the entire patent dataset. We start by loading the complete patent database (which, due to its size, is again handled in chunks) and applying the create_sparse_matrix function. This function generates a very large matrix where rows are unique patents and columns are the 35 technological fields.

The resulting sparse matrix, mat_tech_AI1, indicates the presence of a technology in a given patent, and it looks like this:

kable(as.matrix(mat_tech_AI1[1:20, 1:12]), caption = "Sample of the Sparse AI matrix")

Sample of the Sparse AI matrix
	1	5	6	12
58	0	0	2	0
76	0	0	0	0
111	0	0	0	0
139	0	0	0	4
151	0	0	0	0
159	0	0	0	0
183	0	0	0	0
193	0	0	0	0
200	0	0	0	0
206	0	0	0	0
217	0	0	0	0
218	0	0	0	0
220	0	1	0	0
231	1	0	0	0
243	3	0	0	0
246	0	0	0	0
261	0	0	0	0
266	0	0	0	0
280	0	0	0	0
283	0	0	0	0

By calculating the cross-product of this matrix (t(M) %*% M), we transform it into a 35x35 square co-occurrence matrix. Each cell (i, j) in this new matrix contains a count of how many patents simultaneously list technology i and technology j. This square matrix looks like this:

kable(as.matrix(mat_tech_AI1[1:35, 1:35]), caption = "Sample of the co-occurrence matrix")

Sample of the co-occurrence matrix
1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35
5768289	180008	72869	35505	31634	105672	48931	293699	162448	224446	6043	98397	35221	26284	5711	2489	110241	2856	77403	250123	124491	58683	82108	29344	44158	57741	95078	21109	81035	73388	75597	213765	29184	26222	59692
180008	2531644	182176	197948	36765	435699	37718	187833	317265	91013	2012	90671	33259	5530	1220	688	28068	155	28650	18549	67392	7150	10257	9675	19330	25924	5935	26245	28330	22945	20173	54606	20783	33300	14997
72869	182176	2229334	679000	58436	355831	48692	13799	109871	138039	4223	95899	24034	622	435	139	1132	61	1243	2934	4601	2666	2396	1464	12865	2658	1948	76191	7767	6641	7014	28756	6655	13681	14208
35505	197948	679000	3345610	45697	596334	160911	2506	4448	81955	2157	110688	12585	912	776	93	117	77	108	95	731	166	688	973	4228	596	1913	1919	3500	3584	913	21166	8249	4925	7775
31634	36765	58436	45697	506310	70807	435	34596	4931	26343	380	12005	2645	672	189	39	25	9	222	1357	1170	2069	1471	442	393	448	914	819	399	535	583	2988	754	1838	743
105672	435699	355831	596334	70807	6260090	553534	95310	109728	269221	26784	283610	127652	6859	21330	4725	2902	1485	5442	10175	14978	3842	10562	7448	33521	12340	19023	80353	24244	12310	15613	70014	52055	38240	35601
48931	37718	48692	160911	435	553534	1701101	926	3984	40824	3310	142899	27438	840	1781	836	20	1255	640	518	466	50	2764	2572	15140	988	2243	4467	6398	2500	1003	15738	14459	5927	9893
293699	187833	13799	2506	34596	95310	926	2907011	220961	114839	2712	10402	12349	102013	2263	495	78495	128	184747	59457	199001	53804	45908	14057	22677	62331	10748	16159	34466	19714	5928	6698	1232	8552	7164
162448	317265	109871	4448	4931	109728	3984	220961	2596916	106349	2321	22262	45652	25607	3488	1208	103336	28	86394	25865	88924	25285	18939	5130	47088	21196	8940	82423	75161	5256	24813	19059	4992	13602	5614
224446	91013	138039	81955	26343	269221	40824	114839	106349	5893276	237979	213472	103998	43521	104810	29536	16762	5190	47689	31626	33031	50026	95527	35712	45455	40922	53426	18286	43759	27833	56718	138519	17664	22110	80910
6043	2012	4223	2157	380	26784	3310	2712	2321	237979	810267	8274	27172	48107	261182	96945	5074	7045	10954	5843	4142	9304	24371	6285	2166	1909	2287	2120	15562	1142	1111	2364	730	1079	8314
98397	90671	95899	110688	12005	283610	142899	10402	22262	213472	8274	2064137	43797	1902	1012	7803	1243	2340	5778	5974	3131	680	13263	16084	44920	20787	17820	8993	26462	23917	18569	132219	46615	26491	49748
35221	33259	24034	12585	2645	127652	27438	12349	45652	103998	27172	43797	2429739	16945	35668	125523	31251	7933	25452	19786	31515	8844	59686	46015	48773	14450	20017	22087	52776	21157	14402	19788	54425	44840	10065
26284	5530	622	912	672	6859	840	102013	25607	43521	48107	1902	16945	2867623	218485	880680	142222	72141	366020	46278	19938	9300	345426	30433	2091	4094	1825	14666	19353	2021	1537	1447	1383	10727	1578
5711	1220	435	776	189	21330	1781	2263	3488	104810	261182	1012	35668	218485	2548809	536552	39796	301160	132050	8297	5668	10986	43438	67992	2818	10729	2211	11092	67652	1160	1394	659	404	5087	1955
2489	688	139	93	39	4725	836	495	1208	29536	96945	7803	125523	880680	536552	3083488	65679	234071	85334	13210	6219	17217	24794	3463	2974	1429	1446	4794	34654	355	181	425	2645	4469	240
110241	28068	1132	117	25	2902	20	78495	103336	16762	5074	1243	31251	142222	39796	65679	2049722	20411	343204	68382	104006	14426	94806	28112	20540	9678	4330	88256	501786	2040	23407	36789	6147	22655	24444
2856	155	61	77	9	1485	1255	128	28	5190	7045	2340	7933	72141	301160	234071	20411	1898838	59224	3902	1959	774	26817	9471	15336	2731	264	2880	128187	4135	2318	268	11201	7095	750
77403	28650	1243	108	222	5442	640	184747	86394	47689	10954	5778	25452	366020	132050	85334	343204	59224	3281112	126796	133673	28416	223167	106935	15567	32978	13952	101627	217172	28914	20605	9411	7559	37134	77863
250123	18549	2934	95	1357	10175	518	59457	25865	31626	5843	5974	19786	46278	8297	13210	68382	3902	126796	3478542	211364	137164	225615	137648	8314	183631	36572	20432	114726	66431	32638	11411	2704	9178	81833
124491	67392	4601	731	1170	14978	466	199001	88924	33031	4142	3131	31515	19938	5668	6219	104006	1959	133673	211364	1789373	35420	95519	27346	49898	77804	25745	67269	135262	13007	32746	24564	9689	43825	44971
58683	7150	2666	166	2069	3842	50	53804	25285	50026	9304	680	8844	9300	10986	17217	14426	774	28416	137164	35420	405306	39145	5423	1106	4918	2601	10598	11263	1510	2071	616	149	1395	368
82108	10257	2396	688	1471	10562	2764	45908	18939	95527	24371	13263	59686	345426	43438	24794	94806	26817	223167	225615	95519	39145	2731402	479200	47416	48496	45189	78191	90207	51070	31742	18160	23655	36675	39360
29344	9675	1464	973	442	7448	2572	14057	5130	35712	6285	16084	46015	30433	67992	3463	28112	9471	106935	137648	27346	5423	479200	2069291	14231	24015	83658	8483	46258	84902	14969	28141	7024	8805	52309
44158	19330	12865	4228	393	33521	15140	22677	47088	45455	2166	44920	48773	2091	2818	2974	20540	15336	15567	8314	49898	1106	47416	14231	1786154	54803	10446	51364	57020	9815	60074	63146	39448	41268	47536
57741	25924	2658	596	448	12340	988	62331	21196	40922	1909	20787	14450	4094	10729	1429	9678	2731	32978	183631	77804	4918	48496	24015	54803	2329323	32652	15113	61081	22827	74156	34357	9904	14452	31784
95078	5935	1948	1913	914	19023	2243	10748	8940	53426	2287	17820	20017	1825	2211	1446	4330	264	13952	36572	25745	2601	45189	83658	10446	32652	1816529	2404	17697	74745	123929	100390	5100	7191	39369
21109	26245	76191	1919	819	80353	4467	16159	82423	18286	2120	8993	22087	14666	11092	4794	88256	2880	101627	20432	67269	10598	78191	8483	51364	15113	2404	1308730	42673	1881	8205	6575	13946	59257	7957
81035	28330	7767	3500	399	24244	6398	34466	75161	43759	15562	26462	52776	19353	67652	34654	501786	128187	217172	114726	135262	11263	90207	46258	57020	61081	17697	42673	3126645	15300	49310	69364	18062	29144	90968
73388	22945	6641	3584	535	12310	2500	19714	5256	27833	1142	23917	21157	2021	1160	355	2040	4135	28914	66431	13007	1510	51070	84902	9815	22827	74745	1881	15300	1463621	30728	33966	23370	41410	31058
75597	20173	7014	913	583	15613	1003	5928	24813	56718	1111	18569	14402	1537	1394	181	23407	2318	20605	32638	32746	2071	31742	14969	60074	74156	123929	8205	49310	30728	1831696	212337	19557	16086	103204
213765	54606	28756	21166	2988	70014	15738	6698	19059	138519	2364	132219	19788	1447	659	425	36789	268	9411	11411	24564	616	18160	28141	63146	34357	100390	6575	69364	33966	212337	2772775	40099	19890	113411
29184	20783	6655	8249	754	52055	14459	1232	4992	17664	730	46615	54425	1383	404	2645	6147	11201	7559	2704	9689	149	23655	7024	39448	9904	5100	13946	18062	23370	19557	40099	1236189	41595	36894
26222	33300	13681	4925	1838	38240	5927	8552	13602	22110	1079	26491	44840	10727	5087	4469	22655	7095	37134	9178	43825	1395	36675	8805	41268	14452	7191	59257	29144	41410	16086	19890	41595	1084525	19702
59692	14997	14208	7775	743	35601	9893	7164	5614	80910	8314	49748	10065	1578	1955	240	24444	750	77863	81833	44971	368	39360	52309	47536	31784	39369	7957	90968	31058	103204	113411	36894	19702	3560637

After processing all data chunks, the individual co-occurrence matrices are summed to create a final, comprehensive matrix, which is then saved as Matrix_IPC.csv. This file looks like this:

kable(as.matrix(mat_tech_AI_Final[1:35, 1:35]))

	1	10	11	12	13	14	15	16	17	18	19	2	20	21	22	23	24	25	26	27	28	29	3	30	31	32	33	34	35	4	5	6	7	8	9
1	26242833	1105050	36467	460300	170587	113955	20807	10445	533998	13709	443392	1119478	1469546	724137	219144	419139	145875	237008	327159	460192	142474	432784	362362	386632	400546	915437	146905	128408	278714	143077	202650	445672	189070	1432517	832079
10	1105050	24682955	1084761	1050741	563614	204173	409432	106582	77728	22386	201039	563726	174826	157456	173075	524854	184676	244039	229919	330778	114862	238130	613474	135633	274385	662630	82825	127036	366014	325019	169190	1223744	160542	686541	615023
11	36467	1084761	3525353	34621	131313	253957	1242197	563632	30836	30137	52245	8619	26056	18887	27249	149002	37931	12659	7065	16227	13331	91847	16588	7282	7332	10983	3331	6630	36342	8870	1162	111402	16033	12597	15757
12	460300	1050741	34621	8999453	164516	8290	6446	22004	7817	10141	23760	438005	38608	22339	3116	76178	72363	304054	176239	119692	64315	145060	454421	131213	126129	612923	219052	145379	230550	479100	71090	1314660	735780	73831	123855
13	170587	563614	131313	164516	10467208	97468	155323	563556	179377	44737	152775	198152	100743	153392	27220	325463	213020	217590	74190	94111	127019	252392	86787	92050	71485	88497	261890	209610	43541	46635	15821	492137	120184	50575	349238
14	113955	204173	253957	8290	97468	15116843	1152360	4615828	854888	337538	2235446	47780	231642	104417	31400	1755442	130941	15050	32237	13592	121893	105799	4913	13022	8727	7206	9877	61408	8633	4488	2653	32031	3834	365327	252442
15	20807	409432	1242197	6446	155323	1152360	10401074	2707298	181385	1089819	541154	6197	35052	22737	33307	221998	260273	11041	36376	11044	50116	273218	2593	6068	5164	3185	1754	20616	10697	3065	992	93726	7143	9801	17505
16	10445	106582	563632	22004	563556	4615828	2707298	13025114	316366	896097	390863	4455	67891	24447	56329	152120	17747	15281	8564	7183	22389	160764	2314	2210	3323	3009	11232	18056	1797	1814	290	20856	3661	3727	11026
17	533998	77728	30836	7817	179377	854888	181385	316366	10676610	93923	1944663	240044	362392	579773	46352	527703	125750	125085	56172	24044	488151	2274295	6304	11193	125667	157364	37219	102186	119196	1398	869	11786	1054	379736	640378
18	13709	22386	30137	10141	44737	337538	1089819	896097	93923	7317728	265045	1573	21057	15419	2554	153172	47742	85630	14007	2876	16674	531516	725	19936	5232	2287	52233	39290	2513	458	190	7036	4409	1018	1250
19	443392	201039	52245	23760	152775	2235446	541154	390863	1944663	265045	15422287	356341	685512	733594	101637	1108478	494356	87337	184865	88232	601362	1012771	9616	146983	126105	59265	42519	179676	364466	1428	1584	31035	2739	703190	591978
2	1119478	563726	8619	438005	198152	47780	6197	4455	240044	1573	356341	14335959	199276	506317	30510	68765	45202	147285	248275	31077	296926	232699	993581	90727	115040	239762	99615	210021	88928	798019	300865	2073525	166538	1132716	1778205
20	1469546	174826	26056	38608	100743	231642	35052	67891	362392	21057	685512	199276	16386118	1158401	438789	1165654	675668	57671	922532	232388	145835	662971	24974	463218	200267	71269	19752	51377	386575	895	7543	39318	5415	384304	204091
21	724137	157456	18887	22339	153392	104417	22737	24447	579773	15419	733594	506317	1158401	9278629	112480	555537	145430	295880	425365	135681	373035	890035	21894	90091	209204	139276	62699	239564	262458	1899	8101	55552	1756	1171768	496639
22	219144	173075	27249	3116	27220	31400	33307	56329	46352	2554	101637	30510	438789	112480	1324026	133465	21055	5819	16226	12957	36658	36419	10691	4051	9998	3684	315	4544	1693	584	6949	12485	156	184721	90594
23	419139	524854	149002	76178	325463	1755442	221998	152120	527703	153172	1108478	68765	1165654	555537	133465	13214230	2323343	276808	263347	265500	430127	548439	15323	302186	168414	92185	126170	174306	215646	4346	8945	52088	10773	282961	135042
24	145875	184676	37931	72363	213020	130941	260273	17747	125750	47742	494356	45202	675668	145430	21055	2323343	9199630	75744	138952	460881	58240	222304	13785	446346	85347	133339	49967	59730	271865	5944	5790	35730	19378	73581	39531
25	237008	244039	12659	304054	217590	15050	11041	15281	125085	85630	87337	147285	57671	295880	5819	276808	75744	9267812	360892	62498	397629	374961	93698	55575	293059	316170	190775	221657	262033	15775	2484	174082	109065	139063	447847
26	327159	229919	7065	176239	74190	32237	36376	8564	56172	14007	184865	248275	922532	425365	16226	263347	138952	360892	11407587	174310	109204	347920	16900	127266	387707	170519	52267	84352	187657	2672	4826	64361	21471	348827	140327
27	460192	330778	16227	119692	94111	13592	11044	7183	24044	2876	88232	31077	232388	135681	12957	265500	460881	62498	174310	9032042	15843	88766	13427	374929	659505	542172	27189	35329	178042	6855	7045	81588	11568	64121	60694
28	142474	114862	13331	64315	127019	121893	50116	22389	488151	16674	601362	296926	145835	373035	36658	430127	58240	397629	109204	15843	8187901	299707	477030	14857	59083	41079	76670	340332	45916	15540	6822	501207	28035	93685	702628
29	432784	238130	91847	145060	252392	105799	273218	160764	2274295	531516	1012771	232699	662971	890035	36419	548439	222304	374961	347920	88766	299707	15132833	39136	87262	321723	405485	114989	184052	470602	13139	3479	104021	33017	168451	462908
3	362362	613474	16588	454421	86787	4913	2593	2314	6304	725	9616	993581	24974	21894	10691	15323	13785	93698	16900	13427	477030	39136	10554590	37350	34597	153809	31671	79752	71338	2820312	406133	1706302	220822	112782	760568
30	386632	135633	7282	131213	92050	13022	6068	2210	11193	19936	146983	90727	463218	90091	4051	302186	446346	55575	127266	374929	14857	87262	37350	7500577	170350	167141	134233	209624	166601	13797	4839	53130	12660	101086	24403
31	400546	274385	7332	126129	71485	8727	5164	3323	125667	5232	126105	115040	200267	209204	9998	168414	85347	293059	387707	659505	59083	321723	34597	170350	8975847	1094346	95790	86359	577745	3925	5318	73536	3825	34116	156239
32	915437	662630	10983	612923	88497	7206	3185	3009	157364	2287	59265	239762	71269	139276	3684	92185	133339	316170	170519	542172	41079	405485	153809	167141	1094346	12649692	175913	120849	544994	93460	18938	275421	64206	32146	92514
33	146905	82825	3331	219052	261890	9877	1754	11232	37219	52233	42519	99615	19752	62699	315	126170	49967	190775	52267	27189	76670	114989	31671	134233	95790	175913	5758204	219012	209748	38328	3621	208961	55401	5833	22884
34	128408	127036	6630	145379	209610	61408	20616	18056	102186	39290	179676	210021	51377	239564	4544	174306	59730	221657	84352	35329	340332	184052	79752	209624	86359	120849	219012	5290424	120742	30035	22367	242998	39460	45044	97315
35	278714	366014	36342	230550	43541	8633	10697	1797	119196	2513	364466	88928	386575	262458	1693	215646	271865	262033	187657	178042	45916	470602	71338	166601	577745	544994	209748	120742	15805378	29060	4422	155283	59391	40477	32646
4	143077	325019	8870	479100	46635	4488	3065	1814	1398	458	1428	798019	895	1899	584	4346	5944	15775	2672	6855	15540	13139	2820312	13797	3925	93460	38328	30035	29060	12132499	261134	2358962	648691	11314	27124
5	202650	169190	1162	71090	15821	2653	992	290	869	190	1584	300865	7543	8101	6949	8945	5790	2484	4826	7045	6822	3479	406133	4839	5318	18938	3621	22367	4422	261134	2871031	435879	3267	212314	34443
6	445672	1223744	111402	1314660	492137	32031	93726	20856	11786	7036	31035	2073525	39318	55552	12485	52088	35730	174082	64361	81588	501207	104021	1706302	53130	73536	275421	208961	242998	155283	2358962	435879	25287889	2302961	544407	544095
7	189070	160542	16033	735780	120184	3834	7143	3661	1054	4409	2739	166538	5415	1756	156	10773	19378	109065	21471	11568	28035	33017	220822	12660	3825	64206	55401	39460	59391	648691	3267	2302961	6633202	7694	19252
8	1432517	686541	12597	73831	50575	365327	9801	3727	379736	1018	703190	1132716	384304	1171768	184721	282961	73581	139063	348827	64121	93685	168451	112782	101086	34116	32146	5833	45044	40477	11314	212314	544407	7694	13624917	1366521
9	832079	615023	15757	123855	349238	252442	17505	11026	640378	1250	591978	1778205	204091	496639	90594	135042	39531	447847	140327	60694	702628	462908	760568	24403	156239	92514	22884	97315	32646	27124	34443	544095	19252	1366521	15211979

1.2.2. Calculating Relatedness and Defining the Network

Raw co-occurrence counts can be misleading, as highly prevalent technologies will naturally co-occur more often with others, inflating their apparent relatedness. To correct for this, we normalize the matrix using the relatedness() function from the EconGeo package, which employs a cosine similarity index. The result is a relatedness matrix, where each value represents the strength of the relationship between two technologies. It looks like this:

kable(as.matrix(mat_tech_rel_AI[1:35, 1:35]))

	1	10	11	12	13	14	15	16	17	18	19	2	20	21	22	23	24	25	26	27	28	29	3	30	31	32	33	34	35	4	5	6	7	8	9
1	0.0000000	0.0799467	0.0047211	0.0428721	0.0184079	0.0082334	0.0018598	0.0008350	0.0429860	0.0018410	0.0311911	0.0820082	0.1179296	0.0604547	0.0418364	0.0305216	0.0143538	0.0260261	0.0358266	0.0557421	0.0148884	0.0333604	0.0305611	0.0501727	0.0429894	0.0878084	0.0227148	0.0169696	0.0303031	0.0131214	0.0355793	0.0294009	0.0220326	0.1174738	0.0653999
10	0.0799467	0.0000000	0.1470137	0.1024491	0.0636678	0.0154428	0.0383112	0.0089198	0.0065500	0.0031470	0.0148048	0.0432303	0.0146867	0.0137609	0.0345890	0.0400098	0.0190229	0.0280533	0.0263573	0.0419430	0.0125652	0.0192156	0.0541629	0.0184253	0.0308282	0.0665361	0.0134064	0.0175746	0.0416586	0.0312030	0.0310960	0.0845112	0.0195844	0.0589368	0.0506037
11	0.0047211	0.1470137	0.0000000	0.0060406	0.0265443	0.0343727	0.2079985	0.0844101	0.0046500	0.0075813	0.0068848	0.0011828	0.0039170	0.0029538	0.0097450	0.0203257	0.0069918	0.0026041	0.0014493	0.0036820	0.0026096	0.0132627	0.0026208	0.0017702	0.0014741	0.0019735	0.0009648	0.0016413	0.0074019	0.0015238	0.0003822	0.0137671	0.0035000	0.0019351	0.0023200
12	0.0428721	0.1024491	0.0060406	0.0000000	0.0239255	0.0008072	0.0007765	0.0023708	0.0008481	0.0018353	0.0022526	0.0432429	0.0041755	0.0025134	0.0008017	0.0074761	0.0095962	0.0449977	0.0260101	0.0195390	0.0090577	0.0150696	0.0516510	0.0229477	0.0182439	0.0792333	0.0456472	0.0258926	0.0337822	0.0592146	0.0168210	0.1168833	0.1155540	0.0081597	0.0131196
13	0.0184079	0.0636678	0.0265443	0.0239255	0.0000000	0.0109959	0.0216780	0.0703477	0.0225462	0.0093805	0.0167809	0.0226652	0.0126233	0.0199955	0.0081140	0.0370058	0.0327285	0.0373082	0.0126856	0.0177993	0.0207253	0.0303777	0.0114288	0.0186514	0.0119796	0.0132543	0.0632283	0.0432526	0.0073917	0.0066779	0.0043371	0.0506933	0.0218680	0.0064758	0.0428601
14	0.0082334	0.0154428	0.0343727	0.0008072	0.0109959	0.0000000	0.1076864	0.3857904	0.0719457	0.0473881	0.1644049	0.0036593	0.0194341	0.0091136	0.0062670	0.1336422	0.0134701	0.0017278	0.0036907	0.0017212	0.0133168	0.0085261	0.0004332	0.0017667	0.0009792	0.0007226	0.0015966	0.0084843	0.0009813	0.0004303	0.0004870	0.0022091	0.0004671	0.0313207	0.0207435
15	0.0018598	0.0383112	0.2079985	0.0007765	0.0216780	0.1076864	0.0000000	0.2799333	0.0188849	0.1892857	0.0492366	0.0005871	0.0036381	0.0024551	0.0082240	0.0209085	0.0331239	0.0015681	0.0051521	0.0017302	0.0067735	0.0272392	0.0002828	0.0010184	0.0007168	0.0003951	0.0003508	0.0035238	0.0015042	0.0003636	0.0002253	0.0079971	0.0010766	0.0010395	0.0017795
16	0.0008350	0.0089198	0.0844101	0.0023708	0.0703477	0.3857904	0.2799333	0.0000000	0.0294599	0.1392026	0.0318069	0.0003775	0.0063024	0.0023610	0.0124397	0.0128141	0.0020201	0.0019411	0.0010849	0.0010065	0.0027065	0.0143352	0.0002258	0.0003318	0.0004126	0.0003339	0.0020090	0.0027603	0.0002260	0.0001924	0.0000589	0.0015916	0.0004935	0.0003536	0.0010025
17	0.0429860	0.0065500	0.0046500	0.0008481	0.0225462	0.0719457	0.0188849	0.0294599	0.0000000	0.0146912	0.1593436	0.0204824	0.0338740	0.0563787	0.0103072	0.0447596	0.0144126	0.0159992	0.0071650	0.0033923	0.0594177	0.2041996	0.0006193	0.0016919	0.0157101	0.0175817	0.0067032	0.0157297	0.0150952	0.0001493	0.0001777	0.0009056	0.0001431	0.0362719	0.0586268
18	0.0018410	0.0031470	0.0075813	0.0018353	0.0093805	0.0473881	0.1892857	0.1392026	0.0146912	0.0000000	0.0362294	0.0002239	0.0032835	0.0025013	0.0009474	0.0216734	0.0091282	0.0182713	0.0029805	0.0006769	0.0033857	0.0796114	0.0001188	0.0050270	0.0010911	0.0004263	0.0156934	0.0100893	0.0005309	0.0000816	0.0000648	0.0009019	0.0009983	0.0001622	0.0001909
19	0.0311911	0.0148048	0.0068848	0.0022526	0.0167809	0.1644049	0.0492366	0.0318069	0.1593436	0.0362294	0.0000000	0.0265711	0.0559960	0.0623401	0.0197506	0.0821635	0.0495142	0.0097622	0.0206065	0.0108786	0.0639664	0.0794647	0.0008255	0.0194151	0.0137767	0.0057864	0.0066920	0.0241698	0.0403356	0.0001333	0.0002831	0.0020840	0.0003249	0.0586971	0.0473610
2	0.0820082	0.0432303	0.0011828	0.0432429	0.0226652	0.0036593	0.0005871	0.0003775	0.0204824	0.0002239	0.0265711	0.0000000	0.0169510	0.0448057	0.0061740	0.0053078	0.0047146	0.0171437	0.0288192	0.0039901	0.0328900	0.0190133	0.0888243	0.0124798	0.0130876	0.0243775	0.0163267	0.0294202	0.0102487	0.0775754	0.0559918	0.1449960	0.0205711	0.0984609	0.1481480
20	0.1179296	0.0146867	0.0039170	0.0041755	0.0126233	0.0194341	0.0036381	0.0063024	0.0338740	0.0032835	0.0559960	0.0169510	0.0000000	0.1122969	0.0972703	0.0985640	0.0772005	0.0073536	0.1173081	0.0326857	0.0176960	0.0593409	0.0024458	0.0697998	0.0249585	0.0079379	0.0035464	0.0078840	0.0488047	0.0000953	0.0015378	0.0030119	0.0007327	0.0365945	0.0186267
21	0.0604547	0.0137609	0.0029538	0.0025134	0.0199955	0.0091136	0.0024551	0.0023610	0.0563787	0.0025013	0.0623401	0.0448057	0.1122969	0.0000000	0.0259400	0.0488688	0.0172866	0.0392492	0.0562701	0.0198533	0.0470904	0.0828775	0.0022306	0.0141228	0.0271237	0.0161381	0.0117112	0.0382447	0.0344713	0.0002104	0.0017181	0.0044270	0.0002472	0.1160786	0.0471544
22	0.0418364	0.0345890	0.0097450	0.0008017	0.0081140	0.0062670	0.0082240	0.0124397	0.0103072	0.0009474	0.0197506	0.0061740	0.0972703	0.0259400	0.0000000	0.0268474	0.0057230	0.0017651	0.0049084	0.0043354	0.0105820	0.0077548	0.0024908	0.0014522	0.0029642	0.0009761	0.0001345	0.0016588	0.0005085	0.0001479	0.0033702	0.0022752	0.0000502	0.0418449	0.0196697
23	0.0305216	0.0400098	0.0203257	0.0074761	0.0370058	0.1336422	0.0209085	0.0128141	0.0447596	0.0216734	0.0821635	0.0053078	0.0985640	0.0488688	0.0268474	0.0000000	0.2408849	0.0320283	0.0303868	0.0338858	0.0473609	0.0445449	0.0013617	0.0413193	0.0190457	0.0093170	0.0205560	0.0242718	0.0247047	0.0004200	0.0016548	0.0036207	0.0013228	0.0244499	0.0111838
24	0.0143538	0.0190229	0.0069918	0.0095962	0.0327285	0.0134701	0.0331239	0.0020201	0.0144126	0.0091282	0.0495142	0.0047146	0.0772005	0.0172866	0.0057230	0.2408849	0.0000000	0.0118424	0.0216650	0.0794840	0.0086653	0.0243980	0.0016553	0.0824685	0.0130420	0.0182101	0.0110003	0.0112388	0.0420851	0.0007761	0.0014474	0.0033560	0.0032151	0.0085912	0.0044238
25	0.0260261	0.0280533	0.0026041	0.0449977	0.0373082	0.0017278	0.0015681	0.0019411	0.0159992	0.0182713	0.0097622	0.0171437	0.0073536	0.0392492	0.0017651	0.0320283	0.0118424	0.0000000	0.0627957	0.0120286	0.0660233	0.0459253	0.0125563	0.0114592	0.0499770	0.0481874	0.0468705	0.0465444	0.0452679	0.0022987	0.0006930	0.0182475	0.0201945	0.0181200	0.0559304
26	0.0358266	0.0263573	0.0014493	0.0260101	0.0126856	0.0036907	0.0051521	0.0010849	0.0071650	0.0029805	0.0206065	0.0288192	0.1173081	0.0562701	0.0049084	0.0303868	0.0216650	0.0627957	0.0000000	0.0334559	0.0180825	0.0424959	0.0022585	0.0261691	0.0659356	0.0259172	0.0128058	0.0176637	0.0323296	0.0003883	0.0013426	0.0067278	0.0039646	0.0453271	0.0174767
27	0.0557421	0.0419430	0.0036820	0.0195390	0.0177993	0.0017212	0.0017302	0.0010065	0.0033923	0.0006769	0.0108786	0.0039901	0.0326857	0.0198533	0.0043354	0.0338858	0.0794840	0.0120286	0.0334559	0.0000000	0.0029017	0.0119925	0.0019848	0.0852751	0.1240599	0.0911484	0.0073684	0.0081831	0.0339277	0.0011018	0.0021679	0.0094336	0.0023627	0.0092161	0.0083611
28	0.0148884	0.0125652	0.0026096	0.0090577	0.0207253	0.0133168	0.0067735	0.0027065	0.0594177	0.0033857	0.0639664	0.0328900	0.0176960	0.0470904	0.0105820	0.0473609	0.0086653	0.0660233	0.0180825	0.0029017	0.0000000	0.0349326	0.0608340	0.0029152	0.0095884	0.0059580	0.0179255	0.0680075	0.0075486	0.0021549	0.0018111	0.0499961	0.0049399	0.0116168	0.0835048
29	0.0333604	0.0192156	0.0132627	0.0150696	0.0303777	0.0085261	0.0272392	0.0143352	0.2041996	0.0796114	0.0794647	0.0190133	0.0593409	0.0828775	0.0077548	0.0445449	0.0243980	0.0459253	0.0424959	0.0119925	0.0349326	0.0000000	0.0036815	0.0126303	0.0385134	0.0433813	0.0198312	0.0271295	0.0570693	0.0013440	0.0006813	0.0076540	0.0042914	0.0154076	0.0405814
3	0.0305611	0.0541629	0.0026208	0.0516510	0.0114288	0.0004332	0.0002828	0.0002258	0.0006193	0.0001188	0.0008255	0.0888243	0.0024458	0.0022306	0.0024908	0.0013617	0.0016553	0.0125563	0.0022585	0.0019848	0.0608340	0.0036815	0.0000000	0.0059149	0.0045314	0.0180043	0.0059762	0.0128620	0.0094653	0.3156408	0.0870174	0.1373687	0.0314031	0.0112867	0.0729520
30	0.0501727	0.0184253	0.0017702	0.0229477	0.0186514	0.0017667	0.0010184	0.0003318	0.0016919	0.0050270	0.0194151	0.0124798	0.0697998	0.0141228	0.0014522	0.0413193	0.0824685	0.0114592	0.0261691	0.0852751	0.0029152	0.0126303	0.0059149	0.0000000	0.0343305	0.0301037	0.0389728	0.0520177	0.0340123	0.0023759	0.0015953	0.0065813	0.0027702	0.0155654	0.0036015
31	0.0429894	0.0308282	0.0014741	0.0182439	0.0119796	0.0009792	0.0007168	0.0004126	0.0157101	0.0010911	0.0137767	0.0130876	0.0249585	0.0271237	0.0029642	0.0190457	0.0130420	0.0499770	0.0659356	0.1240599	0.0095884	0.0385134	0.0045314	0.0343305	0.0000000	0.1630165	0.0230018	0.0177238	0.0975515	0.0005590	0.0014500	0.0075338	0.0006922	0.0043448	0.0190709
32	0.0878084	0.0665361	0.0019735	0.0792333	0.0132543	0.0007226	0.0003951	0.0003339	0.0175817	0.0004263	0.0057864	0.0243775	0.0079379	0.0161381	0.0009761	0.0093170	0.0182101	0.0481874	0.0259172	0.0911484	0.0059580	0.0433813	0.0180043	0.0301037	0.1630165	0.0000000	0.0377519	0.0221662	0.0822409	0.0118960	0.0046148	0.0252180	0.0103845	0.0036588	0.0100922
33	0.0227148	0.0134064	0.0009648	0.0456472	0.0632283	0.0015966	0.0003508	0.0020090	0.0067032	0.0156934	0.0066920	0.0163267	0.0035464	0.0117112	0.0001345	0.0205560	0.0110003	0.0468705	0.0128058	0.0073684	0.0179255	0.0198312	0.0059762	0.0389728	0.0230018	0.0377519	0.0000000	0.0647562	0.0510222	0.0078643	0.0014224	0.0308421	0.0144442	0.0010702	0.0040242
34	0.0169696	0.0175746	0.0016413	0.0258926	0.0432526	0.0084843	0.0035238	0.0027603	0.0157297	0.0100893	0.0241698	0.0294202	0.0078840	0.0382447	0.0016588	0.0242718	0.0112388	0.0465444	0.0176637	0.0081831	0.0680075	0.0271295	0.0128620	0.0520177	0.0177238	0.0221662	0.0647562	0.0000000	0.0251031	0.0052672	0.0075093	0.0306541	0.0087931	0.0070635	0.0146262
35	0.0303031	0.0416586	0.0074019	0.0337822	0.0073917	0.0009813	0.0015042	0.0002260	0.0150952	0.0005309	0.0403356	0.0102487	0.0488047	0.0344713	0.0005085	0.0247047	0.0420851	0.0452679	0.0323296	0.0339277	0.0075486	0.0570693	0.0094653	0.0340123	0.0975515	0.0822409	0.0510222	0.0251031	0.0000000	0.0041927	0.0012214	0.0161160	0.0108881	0.0052220	0.0040367
4	0.0131214	0.0312030	0.0015238	0.0592146	0.0066779	0.0004303	0.0003636	0.0001924	0.0001493	0.0000816	0.0001333	0.0775754	0.0000953	0.0002104	0.0001479	0.0004200	0.0007761	0.0022987	0.0003883	0.0011018	0.0021549	0.0013440	0.3156408	0.0023759	0.0005590	0.0118960	0.0078643	0.0052672	0.0041927	0.0000000	0.0608392	0.2065071	0.1003113	0.0012312	0.0028290
5	0.0355793	0.0310960	0.0003822	0.0168210	0.0043371	0.0004870	0.0002253	0.0000589	0.0001777	0.0000648	0.0002831	0.0559918	0.0015378	0.0017181	0.0033702	0.0016548	0.0014474	0.0006930	0.0013426	0.0021679	0.0018111	0.0006813	0.0870174	0.0015953	0.0014500	0.0046148	0.0014224	0.0075093	0.0012214	0.0608392	0.0000000	0.0730503	0.0009672	0.0442314	0.0068774
6	0.0294009	0.0845112	0.0137671	0.1168833	0.0506933	0.0022091	0.0079971	0.0015916	0.0009056	0.0009019	0.0020840	0.1449960	0.0030119	0.0044270	0.0022752	0.0036207	0.0033560	0.0182475	0.0067278	0.0094336	0.0499961	0.0076540	0.1373687	0.0065813	0.0075338	0.0252180	0.0308421	0.0306541	0.0161160	0.2065071	0.0730503	0.0000000	0.2561738	0.0426157	0.0408218
7	0.0220326	0.0195844	0.0035000	0.1155540	0.0218680	0.0004671	0.0010766	0.0004935	0.0001431	0.0009983	0.0003249	0.0205711	0.0007327	0.0002472	0.0000502	0.0013228	0.0032151	0.0201945	0.0039646	0.0023627	0.0049399	0.0042914	0.0314031	0.0027702	0.0006922	0.0103845	0.0144442	0.0087931	0.0108881	0.1003113	0.0009672	0.2561738	0.0000000	0.0010639	0.0025515
8	0.1174738	0.0589368	0.0019351	0.0081597	0.0064758	0.0313207	0.0010395	0.0003536	0.0362719	0.0001622	0.0586971	0.0984609	0.0365945	0.1160786	0.0418449	0.0244499	0.0085912	0.0181200	0.0453271	0.0092161	0.0116168	0.0154076	0.0112867	0.0155654	0.0043448	0.0036588	0.0010702	0.0070635	0.0052220	0.0012312	0.0442314	0.0426157	0.0010639	0.0000000	0.1274471
9	0.0653999	0.0506037	0.0023200	0.0131196	0.0428601	0.0207435	0.0017795	0.0010025	0.0586268	0.0001909	0.0473610	0.1481480	0.0186267	0.0471544	0.0196697	0.0111838	0.0044238	0.0559304	0.0174767	0.0083611	0.0835048	0.0405814	0.0729520	0.0036015	0.0190709	0.0100922	0.0040242	0.0146262	0.0040367	0.0028290	0.0068774	0.0408218	0.0025515	0.1274471	0.0000000

With the relatedness matrix complete, we can now treat it as an adjacency matrix to build a network graph (g_tech_AI). The nodes’ centrality (Eigenvector centrality) is calculated to determine their importance in the network. For visual clarity in later plots, links with below-average weight (relatedness) are filtered out. Finally, a Fruchterman-Reingold layout algorithm is applied to determine the spatial coordinates (coords_tech_AI) of each node for visualization, which results in the following coordinates:

kable(as.data.frame(coords_tech_AI[1:10,]))

x	y
135.6454	65.62816
134.0933	64.22519
135.2521	58.81728
131.3472	67.12986
135.1214	62.67884
138.8334	59.25243
137.6120	58.32931
138.0070	57.48352
136.7269	61.15738
139.9433	57.74551

1.2.3. Preparing Data for Visualization

In the final step of this section, we prepare the specialization data (calculated in Section 1.1) for plotting onto the GTS. We load the summary file (RCA_4countries_detailed.csv) and create a new categorical variable (Var1) that classifies each country-technology pair into one of four states: no specialization (0), general specialization (1), AI-specific (break-through) specialization (2), or coinciding (break-in) specialization (3). This will allow us to map the countries’ technological trajectories directly onto the GTS structure in the next section. The dataset looks like this:

kable(as.data.frame(Newtable[1:10,]))

Var1	Var2	Var3	Freq
No specialization	CN	1974-1988	19
General specialization	CN	1974-1988	16
AI-specific specialization	CN	1974-1988	0
Coinciding specialization	CN	1974-1988	0
No specialization	JP	1974-1988	12
General specialization	JP	1974-1988	7
AI-specific specialization	JP	1974-1988	6
Coinciding specialization	JP	1974-1988	10
No specialization	KR	1974-1988	21
General specialization	KR	1974-1988	14

1.3. Plotting technological spaces

Now that the underlying data and network structures are in place, this section focuses on their visualization. We will generate the key plots presented in the paper, illustrating both the static, global structure of technology and the dynamic, evolving space of AI.

1.3.1. Global technological space (GTS)

We begin by plotting the fundamental structure of the Global Technological Space. This initial visualization is geography-agnostic, meaning it shows the inherent relatedness between technological fields without any country-specific data. The node size corresponds to its centrality (degree), and nodes are clustered and colored by their broader technological sector. This plot serves as the canvas upon which we will later map national trajectories.

  g_tech_AI %>%  ggraph(layout =  coords_tech_AI) + 
  geom_edge_link(aes(width = weight), alpha = 0.4, colour = "grey") + 
  geom_node_point(aes(fill = sector, size = 1000^dgr, shape= sector))+ # 
  scale_shape_manual(values=c(21, 22, 23, 24, 25)) + scale_size("Degree", range = c(2, 12)) + 
  geom_node_text(aes(label = paste0(field_name, "\n(", name, ")")), size = 4, repel = TRUE) +  #field_name or name
  theme_graph(base_family = "sans")+  ggtitle("Global technological space: IPC Technological fields") + 
  theme(legend.title = element_text(size = 14), legend.text = element_text(size = 10)) + 
  guides(colour = guide_legend(override.aes = list(size=10)))+
  geom_mark_hull(aes(x = x, y=y, colour = sector, fill= sector,
                     linetype = sector), alpha = 0.15, expand = unit(2.5, "mm"), size = 1)

Next, we overlay the country-specific specialization data onto the static GTS canvas. This allows us to visualize the technological trajectory of each country over the three time intervals. The shape of each node indicates the type of specialization (general, break-through, or break-in), while hulls are drawn to highlight the clusters of specialization for each period. This composite visualization reveals how each nation’s technological focus has evolved within the global structure. Additionally, an horizontal bar plot is also generated to summarize the main indicators based on the country-specific specializations. Picking China as an example, this overlaid visualization and its linked bar-plot look like this:

#GTS with specialisations per country
country_select <- c("CN", "US", "JP", "KR")
### 1.2.3.3. Third Country
i=1
IPC_RCAs_wide_simplified <- IPC_RCAs_Top4 %>% pivot_wider(id_cols = c(ctry_code, techn_field_nr, Label), 
    names_from = Period_sim,
    values_from = c(RCA_AI_Period, Total_RCA_2, RCA_Gen, RCA_AI, Round_general, Round_AI, Total_RCA), 
    names_glue = "{.value}_Period_{Period_sim}" )

  g_tech_AI %N>% left_join(IPC_RCAs_wide_simplified %>%
                             filter(ctry_code == country_select[i]) %>%
                             select(-ctry_code), by = c("name" = "techn_field_nr")) %>%
  mutate(Shape_Group_P1_Factor = factor(
    ifelse(is.na(Total_RCA_2_Period_1), "NA_Value", as.character(Total_RCA_2_Period_1)),
    levels = c("0", "1", "2", "3", "NA_Value"))) %>% ggraph(layout = coords_tech_AI) +
  geom_edge_link(aes(width = weight), alpha = 0.2, colour = "#CCCCCC", show.legend = FALSE) + 
  geom_node_point(aes(shape = Shape_Group_P1_Factor, 
                      size = 5, stroke = ifelse(Total_RCA_2_Period_1 == 3, 2.5, 1.3),
                      alpha = 1), color = "#FF3300", show.legend = c(shape=TRUE, size=FALSE, stroke=FALSE, alpha=FALSE, color=FALSE)) + 
  geom_node_point(aes(shape = factor(Total_RCA_2_Period_2),
                      size = 5.5, stroke = ifelse(Total_RCA_2_Period_2 == 3, 2.5, 1.3),
                      alpha = 1), color = "#3399FF", show.legend = FALSE) +
  geom_node_point(aes(shape = factor(Total_RCA_2_Period_3), 
                      size = 6.5,stroke = ifelse(Total_RCA_2_Period_3 == 3, 2.5, 1.3),
                      alpha = 1), color = "#009900", show.legend = FALSE) +
  scale_shape_manual(name = "Type of specialisation",
                     values = c("0" = 4, "1" = 1, "2" = 5, "3" = 2, "NA_Value" = 16), breaks = c("0", "1", "2", "3"),                                
                     labels = c("0" = "No specialisation", "1" = "General specialisation", 
                                "2" = "Break-through specialisation", "3" = "Break-in specialisation"), 
                     na.translate = FALSE, drop = FALSE) + scale_size("Degree", range = c(7, 18))+ 
  scale_alpha(guide = "none") + 
  #geom_node_label(aes(label = name), size = 2, repel = F) + 
  geom_mark_hull(aes(filter = Total_RCA_2_Period_1 > .99, x = x, y = y, fill = "Period 1", group = "Period 1"), 
                 concavity = .1, alpha = .11, linetype = "dotted",expand = unit(2, "mm"), size = .5, color = "#FF3300") + 
  geom_mark_hull(aes(filter = Total_RCA_2_Period_2 > .99, x = x, y = y, fill = "Period 2", group = "Period 2"),
                 concavity = .1, alpha = .11, linetype = "longdash",expand = unit(2, "mm"), size = .5, color = "#3399FF") +
  geom_mark_hull(aes(filter = Total_RCA_2_Period_3 > .99, x = x, y = y, fill = "Period 3", group = "Period 3"),
                 concavity = .1, alpha = .02, expand = unit(2, "mm"), size = 1, color = "#009900") +
  scale_fill_manual(name = "Interval colour (same for \nboth nodes and cluster)", # New legend for fill
                    values = c("Period 1" = "#FF3300", "Period 2" = "#3399FF", "Period 3" = "#009900"),
                    labels = c("Interval 1 (1974-1988)", "Interval 2 (1989-2003)", "Interval 3 (2004-2018)")) +
  theme_graph(base_family = "sans") +  theme(legend.position = "bottom", #right
                                             legend.box = "vertical", legend.title = element_text(size = 12, face = "bold"), 
                                             legend.text = element_text(size = 10), legend.key.size = unit(0.7, "cm") ) +
  ggtitle("d) Global technological space: China (1974-2018)") +
  geom_node_text(aes(label = name), size = 5, repel = TRUE) +  #field_name or name
  guides(shape = guide_legend(title.position = "top", 
                              override.aes = list(size = 5, stroke = 1.5, color = "black") ),
         colour = guide_legend(title.position = "top", 
                               override.aes = list(linetype = c("solid", "longdash", "dotted"), 
                                                   alpha = 1, size = 1, shape = NA) ))

  bar_plot_China <- bar_plot_China <- IPC_RCAs_Top4[IPC_RCAs_Top4$ctry_code == country_select[i],] %>%                                   
  arrange(Label, Period) %>%  group_by(Label) %>%                          
  mutate( general = Total_RCA_2 == 1,    
          break_in              = Total_RCA_2 == 2,            
          break_through         = Total_RCA_2 == 3,            
          sustained_general    = general  & lag(general, 1, default = FALSE),
          sustained_break_in    = break_in  & lag(break_in, 1, default = FALSE),
          sustained_break_through    = break_through & lag(break_through, 1, default = FALSE)) %>% 
  ungroup()

bar_plot_China <- bar_plot_China %>% 
  group_by(Period) %>% summarise(`General case`                 = sum(general,           na.rm = TRUE),
                                 `Break-through case`                 = sum(break_in,           na.rm = TRUE),
                                 `Break-in case`            = sum(break_through,      na.rm = TRUE),
                                 `Sustained General case`       = sum(sustained_general, na.rm = TRUE),
                                 `Sustained break-through case`       = sum(sustained_break_in, na.rm = TRUE),
                                 `Sustained break-in case`  = sum(sustained_break_through, na.rm = TRUE),
                                 .groups = "drop") %>% arrange(Period)

plot_long_China <- bar_plot_China |>  rename(Period = Period) |>
  pivot_longer(cols= -Period,names_to= "Indicator",values_to = "Count")

#order labels
plot_long_China$Indicator <- factor(plot_long_China$Indicator, levels = rev(c("General case", "Break-through case",  "Break-in case", 
                                                              "Sustained General case", "Sustained break-through case", "Sustained break-in case")))
plot_long_China$Period <- factor(plot_long_China$Period, levels = c("2004-2018", "1989-2003", "1974-1988"))

legend_order <- c(
  "General case", "Break-through case", "Break-in case",
  "Sustained General case", "Sustained break-through case", "Sustained break-in case"
)

  ggplot(plot_long_China, aes(x = factor(Period),y = Count, fill = Indicator)) +
  geom_col(position = position_dodge(width = .8), width = .7) +
  scale_fill_manual(values = c("General case"  = "#FF3300",
    "Sustained General case"  = "#993333",
    "Break-in case"                 = "#009900", #3399FF
    "Sustained break-in case"       = "#006633", #3333CC
    "Break-through case"            = "#3399FF",  #009900
    "Sustained break-through case"  = "#3333CC"),
    breaks = legend_order) + #006633
  guides(fill = guide_legend(nrow = 2, byrow = TRUE)) +
  labs(x = "Interval",y = "Number of cases", fill = NULL, title = NULL)+
  ggtitle("Summary of specialisations China") +
  theme_classic(base_size = 11) + theme(legend.position = "bottom")+ coord_flip()

The plotting code is structured to iterate through each of the four focus countries by changing the i variable. The resulting figures, each depicting a single country’s trajectory over three periods alongside a summary bar chart, are then saved.

1.3.2. AI-specific technological space (ATS)

Unlike the static GTS, the AI-specific Technological Space (ATS) is dynamic. Its structure is recalculated for each time interval, reflecting the rapid evolution of AI technology. Here, the relatedness between fields is based only on their co-occurrence within AI patents for that specific period. This approach allows us to observe which technological fields form the core of AI innovation at different points in time.

Starting with the first interval (1974-1988), the top 10 most central technological fields in the AI space are:

g_tech_AI %N>%   arrange(desc(dgr)) %>%  as_tibble() %>%  slice(1:10)

We use the previously calculated AI specialization data (AI_RCA) to highlight the core technologies in each period. A binary flag indicates whether AI has an RTA ≥ 1 in a given field (and Period_sim refers to each interval, going from 1 to 3), like this:

kable(as.data.frame(AI_RCA[1:6,]))

techn_field_nr	RCA_AI_Period	Period_sim	Binary
1	0.0593493	1	0
2	0.0853538	1	0
3	0.3360644	1	0
4	1.0978493	1	1
5	1.6452825	1	1
6	16.6484958	1	1

The following code generates the ATS for the first interval (1974-1988). The nodes with labels are those where AI is specialized (RTA ≥ 1).

AI_RCA1 <- AI_RCA[AI_RCA$Period_sim == 1,]
p=1
  g_tech_AI %N>%
  left_join(AI_RCA1 %>% filter(Period_sim == p), by = c("name" = "techn_field_nr")) %>%
  ggraph(layout = coords_tech_AI) + 
  geom_edge_link(aes(width = weight), alpha = 0.2, colour = "#CCCCCC") +
  geom_node_point(aes(fill = sector, size = 1000^dgr, shape= sector)) +
  scale_shape_manual(values=c(21, 22, 23, 24, 25)) + labs(color   = "RCA")+ scale_size("Degree", range = c(2, 12)) +
  geom_node_text(aes(filter=Binary > .99, label = field_name), size = 6, repel = TRUE) +
  theme_graph(base_family = "sans") + guides(colour = guide_legend(override.aes = list(size=5)))+
  ggtitle("AI-specific technological space (1974-1988)") #

We do the same for the 2 other intervals, and combine the three figures again using the multiplot custom function. The resulting figure is saved at Files_created_with_the_code/figures/Figure_2_ATS_and_AI_core_technologies_3_intervals.jpg.

2. Generating Descriptive Figures

This section details the creation of the paper’s descriptive figures. These visualizations illustrate key trends in AI patenting and the evolution of national specialization strategies that motivate our main analysis.

2.1. Share of Break-in specialisations (Fig 6 and 7)

Here, we generate the plots showing the share of ‘break-in’ specializations for each country over time. This metric is central to our paper’s narrative and is calculated as the ratio of coinciding specializations (specialized in both the general field and its AI application) to the country’s total number of general specializations. A higher share indicates that a larger portion of a country’s established technological strengths is being integrated with AI. We first perform this analysis at the technological field level.

The data is processed to count the number of ‘coinciding’, ‘general only’, and ‘AI only’ specializations for each country and period. From these counts, the Share_coinciding is calculated. The resulting summary table is shown below.

SummaryAllData<-distinct(IPC_RCAs, ctry_code, Period, .keep_all = TRUE) 
colnames(SummaryAllData)[1] <- "Country"
head(SummaryAllData)

This summarized data is then used to plot the evolution of the break-in share for the four focus countries (Figure 6).

  ggplot(data=SummaryAllData, aes(x=Period, y=Share_coinciding, group=Country, shape = Country, color=Country)) +
  geom_point(aes(fill = Country), size=8) +   scale_shape_manual(values=c(21, 22, 24, 23)) +
  xlab("Interval") +  ylab("Share of break-in specialisations (%)") +
  theme_classic() +  geom_line(aes(color=Country), linetype = "dashed", size=1.5)+
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) +
  scale_color_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A"))

To ensure the robustness of our findings, we repeat the analysis at a more granular level of technological classification: the 4-digit IPC subclass. This serves as a check to confirm that the observed trends are not an artifact of the broader 35-field aggregation.

The resulting plot (Figure 7) confirms that the trends observed at the field level are consistent at the more detailed subclass level.

  ggplot(data=SummaryAllData4dig, aes(x=Period, y=Share_coinciding, group=Country, shape = Country, color=Country)) +
  geom_point(aes(fill = Country), size=8) + 
  scale_shape_manual(values=c(21, 22, 24, 23)) +
  xlab("Interval") +
  ylab("Share of break-in specialisations (%)") +
  theme_classic() +
  geom_line(aes(color=Country), linetype = "dashed", size=1.5)+
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) +
  scale_color_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A"))

2.2. Growth of AI Patents (Fig 1)

This section reproduces Figure 1 from the paper, which illustrates the dramatic growth in AI patenting since the 1970s. We use the raw AI patent data, aggregating the number of unique patent applications per country for each year. A log-10 scale is used for the y-axis to accommodate the exponential increase in patent counts and allow for a clearer comparison of growth trajectories between the four focus countries, resulting in the figure seen below.

  ggplot(data=test, aes(x=Year, y=log10(Number_of_AI_patents), group=Country, colour=Country, shape=Country)) +
  geom_line(size=1.2, aes(linetype=Country)) +
  geom_point(size=4) +  xlab("Year") +  ylab("Number of new AI registers [Log10]") + theme_classic() +
  scale_linetype_manual(values=c("twodash", "longdash", "solid", "solid")) +
  scale_shape_manual(values=c(16, 15, 17, 18)) + theme(legend.position="bottom") +
  theme(text = element_text(size = 15)) +  scale_y_continuous(limits=c(0,4)) + 
  geom_vline(data=test, aes(xintercept=c(1988),  colour=Period), linetype="dashed", size=1, color = "grey") +  
  geom_vline(data=test, aes(xintercept=c(2003),  colour=Period), linetype="dashed", size=1, color = "grey") +  
  scale_x_continuous(breaks = c(1974, 1988, 2003, 2018), limits=c(1974, 2018)) + scale_color_brewer(palette="Dark2") + 
  annotate("rect", xmin = 1974, xmax=1988, ymin = 3.6, ymax = 4, alpha = .01, color = "black") +
  annotate("text", x = 1981, y = 3.8, label = c("First Interval \n(1974-1988)"), size=4)+
  annotate("rect", xmin = 1988, xmax=2003, ymin = 3.6, ymax = 4, alpha = .01, color = "black") +
  annotate("text", x = 1996, y = 3.8, label = c("Second Interval \n(1989-2003)"), size=4) +
  annotate("rect", xmin = 2003, xmax=2018, ymin = 3.6, ymax = 4, alpha = .01, color = "black") +
  annotate("text", x = 2011, y = 3.8, label = c("Third Interval \n(2004-2018)"), size=4)

3. Robustness Checks: Permutation Analysis

To ensure that our findings are statistically robust and not merely the result of random chance, we conduct a permutation analysis. The core idea is to create a “null model” by generating thousands of randomized AI patent datasets. By comparing our actual results to the distribution of results from these random datasets, we can assess the statistical significance of our observations. This section details the creation of these permuted datasets and the subsequent recalculation of specialization metrics.

3.1. Permutate the AI dataset

The first step is to generate the randomized, or permuted, datasets. For each of the four focus countries and for each time interval, we follow a specific procedure:

Count the number of actual AI patents the country has in that interval.
Randomly select the same number of patents from that country’s entire pool of patents (both AI and non-AI) for that interval.
Treat this random sample as the new, ‘permuted’ AI dataset for that country.

This process is repeated 1,000 times (note: num_permutations is set to 10 in this example for faster execution) to create 1,000 counterfactual scenarios where ‘AI’ patents are just random draws from a country’s overall technological portfolio.

We begin by reloading the patent data for the first interval (1974-1988) to establish the pool from which random patents will be drawn.

The resulting count of technological fields per country is:

kable(as.data.frame(region_tech_fields_1_df[1:6,]))

ctry_code	techn_field_nr	n_tech_reg
AD	20	1
AD	24	1
AD	28	3
AD	32	1
AD	33	1
AD	34	3

The following loop executes the permutation logic. For each of the 10 iterations, it samples a new set of random ‘AI’ patents for the target countries.

list_of_permuted_dfs <- vector("list", length = num_permutations)

for (p in 1:num_permutations) {
  if (p %% 100 == 0) print(paste("Permutation number:", p)) # Progress indicator
  
  # This dataframe will hold the permuted AI patents for target countries ONLY for THIS iteration
  permuted_ai_for_target_countries_iter <- data.frame()
  
  for (country in target_countries) {
    # 1. Identify and Count ACTUAL AI patents for the current country from the original AI dataset
    actual_ai_appln_ids_country <- ai_patents_period_1_df %>%
      filter(ctry_code == country) %>%
      distinct(appln_id) %>%
      pull(appln_id)
    
    n_ai_country <- length(actual_ai_appln_ids_country)
    
    if (n_ai_country == 0) {
      # print(paste("No AI patents found for", country, "in original AI data. Skipping for perm", p))
      next # Skip to the next country if no AI patents to replace
    }
    
    # 2. Prepare the pool of ALL patents for the current country from the general dataset
    country_all_patents_pool <- ipc_all_patents_first_period_df %>%
      filter(ctry_code == country) %>%
      distinct(appln_id)
    
    if (nrow(country_all_patents_pool) == 0) {
      # print(paste("No patents in general pool for", country, ". Skipping for perm", p))
      next
    }
    
    # Handle cases where the pool is smaller than the number of AI patents to sample
    # This is unlikely if ipc_all_patents_first_period_df is complete, but good for robustness
    sample_size <- min(n_ai_country, nrow(country_all_patents_pool))
    replace_sampling <- FALSE
    if (n_ai_country > nrow(country_all_patents_pool)) {
      # print(paste("Warning: For country", country, "in perm", p,
      #             "not enough unique patents in pool. Sampling", nrow(country_all_patents_pool),
      #             "instead of", n_ai_country, "OR consider sampling with replacement."))
      # Decide: either sample fewer (as done with min()), or sample with replacement.
      # If sampling with replacement is desired:
      # sample_size <- n_ai_country
      # replace_sampling <- TRUE
      # For now, we sample up to the available pool size without replacement.
      # Or, if strict adherence to n_ai_country is needed and pool is too small WITH replace=FALSE:
      if(nrow(country_all_patents_pool) < n_ai_country && !replace_sampling){
        # print(paste("Strict N_AI needed, but pool too small for", country, "in perm", p, ". Skipping country for this perm."))
        next # Skip this country for this permutation if not enough patents
      }
    }
    
    
    # 3. Randomly select an equivalent number of unique appln_ids from this country's general pool
    random_appln_ids_country <- sample(country_all_patents_pool$appln_id,
                                       size = sample_size, # Use adjusted sample_size
                                       replace = replace_sampling) # Use replace_sampling flag
    
    # 4. Get all rows for these randomly selected patents from the ipc_all_patents_first_period_df
    randomly_selected_patents_df_country <- ipc_all_patents_first_period_df %>%
      filter(appln_id %in% random_appln_ids_country & ctry_code == country)
    
    # 5. Add these randomly selected patents for the current country to the iteration's df
    if (nrow(randomly_selected_patents_df_country) > 0) {
      permuted_ai_for_target_countries_iter <- bind_rows(
        permuted_ai_for_target_countries_iter,
        randomly_selected_patents_df_country
      )
    }
  } # End of country loop
  
  # Add the permutation number to all rows of this iteration's dataframe
  if (nrow(permuted_ai_for_target_countries_iter) > 0) {
    permuted_ai_for_target_countries_iter$permutation_number <- p
  }
  
  # Store the dataframe for this iteration in the list
  list_of_permuted_dfs[[p]] <- permuted_ai_for_target_countries_iter
  
} # End of permutation loop

# Combine all permuted dataframes from the list into one large dataframe
final_permuted_dataset <- bind_rows(list_of_permuted_dfs)

The resulting permuted dataset looks like this for the 6 initial and 6 last lines:

kable(as.data.frame(final_permuted_dataset[1:6,]))

appln_id	ctry_code	techn_field_nr	permutation_number
16633049	JP	13	1
16633049	JP	16	1
16633049	JP	17	1
16633049	JP	23	1
16633049	JP	29	1
25198664	JP	1	1

tail(final_permuted_dataset)

This process creates a long-format dataframe where each permutation_number represents a complete, unique, randomized AI dataset. A summary for the number of patents for Japan and the US in the first 5 permutations is shown below (only these countries had patents in the first interval; China and South Korea join in the second interval).

final_permuted_dataset %>%
  filter(permutation_number <= 5) %>%
  group_by(permutation_number, ctry_code) %>%
  summarise(unique_appln_ids = n_distinct(appln_id), .groups = 'drop') %>%
  print(n=20)

## # A tibble: 10 × 3
##    permutation_number ctry_code unique_appln_ids
##                 <int> <chr>                <int>
##  1                  1 JP                     307
##  2                  1 US                     107
##  3                  2 JP                     307
##  4                  2 US                     107
##  5                  3 JP                     307
##  6                  3 US                     107
##  7                  4 JP                     307
##  8                  4 US                     107
##  9                  5 JP                     307
## 10                  5 US                     107

A crucial step is to handle the non-target countries. Since our hypothesis is not about them, their actual AI patents are kept constant and are simply replicated across all 1,000 permutations. This ensures that the global context remains stable while only the composition of AI within our focus countries is randomized.

The resulting dataset looks like this for the first and last 6 observations:

kable(as.data.frame(replicated_not_selected_ai_final[1:6,]))

appln_id	ctry_code	permutation_number
16723353	FR	0
16723353	FR	0
16723353	FR	0
36147193	IE	0
36147193	IE	0
36147193	IE	0

tail(replicated_not_selected_ai_final)

Finally, the permuted data for the target countries is combined with the replicated data for non-target countries. We also add the original, non-permuted AI dataset, labeling it as permutation_number = 0. This allows for direct comparison. The entire collection is then joined with the technological field information to prepare for the RTA calculation.

The result is a single dataframe containing the original AI data (permutation 0) and 10 random variations (again, 10 is used for this illustrative example; the number of permutations was set to 1000 in the files that are saved in the folder Files_created_with_the_code/data/files_code_Fields_analysis/robustness/). A summary of this file for this illustrative example looks like this:

table(final_permuted_dataset$permutation_number)

## 
##    0    1    2    3    4    5    6    7    8    9   10 
## 1418 1180 1240 1224 1161 1223 1251 1198 1225 1229 1154

3.2. Calculate AI-specific specialisations

With the 11 datasets (1 actual + 10 permuted) assembled, we now repeat the exact same Revealed Technological Advantage (RTA) calculation performed in Section 1.1. This is done for each permutation, allowing us to generate a distribution of RTA scores for each technological field under the null hypothesis (i.e., when ‘AI’ is random).

The code below iterates through each permutation_number, calculates the AI-specific RTAs for that dataset, and stores the results.

list_of_rca_dfs <- region_tech_fields_perm_df %>%
  group_by(permutation_number) %>%
  group_split() %>% # This splits the df into a list of dfs, one for each permutation
  purrr::map(~{
    current_permutation_number <- unique(.x$permutation_number)
    print(paste("Processing RCA for permutation_number:", current_permutation_number))
    
    # Matrix creation for the current permutation's data
    mat_reg_tech_perm_AI <- .x %>%
      select(-permutation_number) %>% # Temporarily remove for pivot if it causes issues
      arrange(techn_field_nr, ctry_code) %>%
      pivot_wider(names_from = techn_field_nr,
                  values_from = n_tech_reg,
                  values_fill = 0) # Changed from list(n_tech_reg = 0) for simplicity
    
    # Check if ctry_code column exists and is not empty
    if (!"ctry_code" %in% names(mat_reg_tech_perm_AI) || nrow(mat_reg_tech_perm_AI) == 0 || all(is.na(mat_reg_tech_perm_AI$ctry_code))) {
      print(paste("Skipping permutation", current_permutation_number, "due to missing ctry_code or empty data after pivot."))
      return(NULL) # Return NULL or an empty tibble
    }
    
    # Check for duplicate ctry_codes which would prevent rownames_to_column
    if (any(duplicated(mat_reg_tech_perm_AI$ctry_code))) {
      print(paste("Warning: Duplicate ctry_code found for permutation", current_permutation_number, ". Aggregating or handling needed."))
      return(tibble(permutation_number = current_permutation_number, error="duplicate ctry_code"))
    }
    
    
    mat_reg_tech_perm_AI <- mat_reg_tech_perm_AI %>%
      remove_rownames() %>%
      column_to_rownames(var = "ctry_code") %>%
      as.matrix() %>%  round()# No rounding here, location_quotient might prefer raw numbers
    
    # RCA calculation
    # Ensure matrix is suitable (e.g., no NA/NaN/Inf that location_quotient can't handle)
    if (nrow(mat_reg_tech_perm_AI) == 0 || ncol(mat_reg_tech_perm_AI) == 0) {
      print(paste("Skipping RCA for permutation", current_permutation_number, "due to empty matrix."))
      return(NULL)
    }
    
    # Ensure there are at least two columns for location_quotient (ctry_code was one)
    if (ncol(mat_reg_tech_perm_AI) < 1) { # If only ctry_code was present and now it's rownames
      print(paste("Skipping RCA for permutation", current_permutation_number, "due to insufficient columns in matrix."))
      return(NULL)
    }
    
    
    # Check for all zero rows/columns if location_quotient is sensitive
    # For example, if a row sum is 0, RCA might be NaN or Inf.
    # The location_quotient function might handle this, or you might need pre-filtering.
    
    rca_results_perm <- tryCatch({
      mat_reg_tech_perm_AI %>%
        location_quotient(binary = FALSE) %>% 
        as.data.frame() %>%
        rownames_to_column("ctry_code") %>%
        as_tibble() %>%
        gather(key = "techn_field_nr", value = "RCA", -ctry_code) %>%
        arrange(ctry_code, techn_field_nr) %>%
        mutate(permutation_number = current_permutation_number) # Add back permutation number
    }, error = function(e) {
      print(paste("Error in location_quotient for permutation", current_permutation_number, ":", e$message))
      return(tibble(permutation_number = current_permutation_number, ctry_code=NA, techn_field_nr=NA, RCA=NA, error_message = e$message)) # Return an empty or error-marked tibble
    })
    
    return(rca_results_perm)
  })

## [1] "Processing RCA for permutation_number: 0"
## [1] "Processing RCA for permutation_number: 1"
## [1] "Processing RCA for permutation_number: 2"
## [1] "Processing RCA for permutation_number: 3"
## [1] "Processing RCA for permutation_number: 4"
## [1] "Processing RCA for permutation_number: 5"
## [1] "Processing RCA for permutation_number: 6"
## [1] "Processing RCA for permutation_number: 7"
## [1] "Processing RCA for permutation_number: 8"
## [1] "Processing RCA for permutation_number: 9"
## [1] "Processing RCA for permutation_number: 10"

# Combine the list of RCA dataframes into one final dataframe
final_rca_all_permutations_df <- bind_rows(list_of_rca_dfs)

The final output is a comprehensive dataframe containing the calculated RTA scores for every country, technology, and permutation, which looks like this for the first and last 6 rows:

kable(as.data.frame(final_rca_all_permutations_df[1:6,]))

ctry_code	techn_field_nr	RCA	permutation_number
AT	1	0	0
AT	10	0	0
AT	11	0	0
AT	12	0	0
AT	13	0	0
AT	17	0	0

tail(final_rca_all_permutations_df)

This process is repeated for all three time intervals, and the resulting dataframes are saved at Files_created_with_the_code/data/files_code_Fields_analysis/robustness/. These files form the basis for the statistical tests in our econometric analysis. It is also worth noting that these permutations are repeated for several distinct interval lengths, i.e., 15-years, 10-years, 5-years, and 1-year.

4. Regression Analysis

This final section presents the econometric analysis designed to formally test our paper’s hypotheses. Using the data prepared in the previous steps, we construct a panel dataset covering our four focus countries across nine 5-year intervals (from 1974-1978 to 2014-2018). We then run a series of regression models to investigate the factors influencing the emergence and persistence of different types of technological specializations.

The following code block handles the final data preparation, loading the pre-calculated metrics for relative density and specializations, and merging them into a single dataframe ready for regression.

The final dataset for regression looks like this:

head(regression_data_renamed) %>%
  knitr::kable(
    caption = "Preview of the Final Regression Dataset",
    booktabs = TRUE # A style option for prettier tables
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )

Preview of the Final Regression Dataset
Country	Country’s techn. rel. dens.	period	no_specialization	general_specialization	No. of ‘break-in’ spec.	actual_share_coinciding	actual_share_round_ai	actual_share_round_general	No. of sustained ‘break-in’ spec.	actual_persistent_just_general	No. of sustained ‘AI-specific’ spec.	actual_n_persistent_core_fields	actual_n_persistent_coin_core_fields	actual_ai_core_fields	actual_ai_not_core_fields	No. of sustained ‘general’ spec.	No. of ‘general’ spec.	double_check	total_specializations	No. of ‘AI-specific’ spec.	Share of ‘break-in’ spec.	Interval
Japan	57	1974-1978	15	19	1	0.0500000	0.0285714	0.5714286	0	0	0	0	0	1	0	0	20	35	20	1	0.0500000	1974-1978
South Korea	38	1974-1978	22	13	0	0.0000000	0.0000000	0.3714286	0	0	0	0	0	0	0	0	13	35	13	0	0.0000000	1974-1978
US	50	1974-1978	19	15	1	0.0625000	0.0285714	0.4571429	0	0	0	0	0	0	1	0	16	35	16	1	0.0625000	1974-1978
China	35	1974-1978	24	11	0	0.0000000	0.0000000	0.3142857	0	0	0	0	0	0	0	0	11	35	11	0	0.0000000	1974-1978
China	32	1979-1983	23	12	0	0.0000000	0.0000000	0.3428571	0	5	0	0	0	0	0	5	12	35	12	0	0.0000000	1979-1983
Japan	56	1979-1983	17	16	2	0.1111111	0.0571429	0.5142857	1	15	1	1	1	2	0	17	18	35	18	2	0.1111111	1979-1983

4.1. Main models (i.e., the ones from the paper)

Our main analysis consists of two sets of Ordinary Least Squares (OLS) models. The first set (Table 3) examines the factors that influence a country’s share of ‘break-in’ specializations. The second set (Table 4) investigates the determinants of the persistence of specializations over time.

The first three models test the effect of technological relatedness density on the share of break-in specializations. Model 1 provides a baseline, Model 2 adds control variables for the number of general and AI-specific specializations, and Model 3 includes country fixed effects. The estimations for the three first models are:

**Effects on the share of break-ins - OLS regression**

	Dependent variable:

	`Share of ‘break-in’ spec.`
	(1)	(2)	(3)

`Country’s techn. rel. dens.`	0.001 (0.005)	0.001 (0.006)	-0.001 (0.006)
`Interval`1979-1983	0.004 (0.132)	-0.012 (0.074)	-0.013 (0.075)
`Interval`1984-1988	0.164 (0.132)	-0.016 (0.079)	-0.045 (0.084)
`Interval`1989-1993	0.348^** (0.131)	0.019 (0.085)	-0.005 (0.089)
`Interval`1994-1998	0.405^*** (0.131)	0.005 (0.093)	-0.002 (0.095)
`Interval`1999-2003	0.455^*** (0.136)	-0.007 (0.098)	-0.034 (0.103)
`Interval`2004-2008	0.535^*** (0.132)	0.080 (0.095)	0.048 (0.100)
`Interval`2009-2013	0.534^*** (0.131)	0.065 (0.096)	0.042 (0.101)
`Interval`2014-2018	0.666^*** (0.132)	0.118 (0.105)	0.093 (0.110)
`Country`Japan			-0.018 (0.055)
`Country`South Korea			0.063 (0.054)
`Country`US			0.049 (0.051)
`No. of ‘general’ spec.`		-0.010 (0.016)	0.002 (0.018)
`No. of ‘AI-specific’ spec.`		0.036^*** (0.005)	0.038^*** (0.005)
Constant	-0.036 (0.227)	0.100 (0.130)	0.017 (0.152)

Observations	36	36	36
R²	0.673	0.905	0.915
Adjusted R²	0.560	0.862	0.859
Residual Std. Error	0.185 (df = 26)	0.104 (df = 24)	0.105 (df = 21)
F Statistic	5.958^*** (df = 9; 26)	20.802^*** (df = 11; 24)	16.181^*** (df = 14; 21)

Note:	p<0.1; p<0.05; p<0.01

The next set of models shifts the focus to persistence. The dependent variable is now the count of specializations of a certain type that are sustained from the previous period. These models help us understand what factors contribute to the durability of a country’s technological advantages. The estimations for these three models are:

**Effects on persisting specialisations - OLS regression**

	Dependent variable:

	`No. of sustained ‘break-in’ spec.`		`No. of sustained ‘AI-specific’ spec.`
	(1)	(2)	(3)

`Country’s techn. rel. dens.`	0.039 (0.082)	0.015 (0.031)	0.006 (0.078)
`Interval`1979-1983	0.480 (1.092)	-1.431 (1.171)	-0.124 (1.044)
`Interval`1984-1988	-0.941 (1.192)	-1.901 (1.120)	-0.628 (1.151)
`Interval`1989-1993	0.423 (1.258)	-1.942 (1.330)	-0.292 (1.201)
`Interval`1994-1998	1.854 (1.368)	-1.263 (1.148)	0.158 (1.354)
`Interval`1999-2003	1.708 (1.477)	-1.280 (1.234)	-0.520 (1.447)
`Interval`2004-2008	0.919 (1.405)	-1.812 (1.245)	0.807 (1.350)
`Interval`2009-2013	3.546^** (1.420)	0.012 (1.305)	-1.135 (1.524)
`Interval`2014-2018	2.861^* (1.572)	-0.420 (1.297)	-0.458 (1.601)
`No. of ‘break-in’ spec.`	0.413^* (0.226)		-0.182 (0.231)
`No. of ‘general’ spec.`	0.113 (0.239)		-0.053 (0.228)
`No. of ‘AI-specific’ spec.`	-0.067 (0.150)		0.297^** (0.143)
`No. of sustained ‘break-in’ spec.`			1.115^*** (0.199)
`No. of sustained ‘general’ spec.`		0.133^* (0.074)
`No. of sustained ‘AI-specific’ spec.`		0.525^*** (0.070)
Constant	-3.642^* (1.955)	-0.669 (1.464)	0.482 (1.997)

Observations	36	36	36
R²	0.799	0.914	0.921
Adjusted R²	0.694	0.874	0.875
Residual Std. Error	1.531 (df = 23)	0.981 (df = 24)	1.458 (df = 22)
F Statistic	7.606^*** (df = 12; 23)	23.111^*** (df = 11; 24)	19.828^*** (df = 13; 22)

Note:	p<0.1; p<0.05; p<0.01

4.2. Extensions (i.e., models used for additional robustness, not included in the paper)

To ensure the robustness of our results, we re-estimate our models using specifications better suited to the nature of our dependent variables.

For the models predicting the share of break-ins (a value between 0 and 1), we use Beta regression.
For the models predicting the count of persistent specializations, we use Poisson and Negative Binomial regressions, which are designed for count data.

The Beta regression results for the share of break-ins are presented below.

**Effects on the share of break-ins - Beta regression**

	Dependent variable:

	`Share of ‘break-in’ spec.`
	(1)	(2)	(3)

`Country’s techn. rel. dens.`	0.018 (0.020)	0.015 (0.025)	-0.004 (0.023)
`Interval`1979-1983	-0.046 (0.675)	-0.156 (0.582)	-0.317 (0.564)
`Interval`1984-1988	0.442 (0.654)	0.355 (0.536)	0.216 (0.508)
`Interval`1989-1993	2.031^*** (0.604)	0.882^* (0.513)	0.707 (0.492)
`Interval`1994-1998	2.199^*** (0.605)	0.696 (0.538)	0.588 (0.514)
`Interval`1999-2003	2.553^*** (0.629)	0.677 (0.553)	0.442 (0.535)
`Interval`2004-2008	2.867^*** (0.611)	1.091^** (0.545)	0.883^* (0.522)
`Interval`2009-2013	2.784^*** (0.608)	0.968^* (0.545)	0.797 (0.522)
`Interval`2014-2018	3.201^*** (0.622)	1.082^* (0.586)	0.923^* (0.561)
`Country`Japan			-0.004 (0.230)
`Country`South Korea			0.324 (0.250)
`Country`US			0.526^** (0.220)
`No. of ‘general’ spec.`		-0.088 (0.069)	-0.041 (0.072)
`No. of ‘AI-specific’ spec.`		0.188^*** (0.023)	0.208^*** (0.023)
Constant	-3.342^*** (1.052)	-2.609^*** (0.762)	-2.769^*** (0.818)

Observations	36	36	36
R²	0.735	0.895	0.895
Log Likelihood	28.419	48.647	52.027

Note:	p<0.1; p<0.05; p<0.01

Compared to the OLS-based Table 3, the estimations from the Beta regression are highly consistent. The main differences are minor shifts in significance levels for some time intervals and the US country dummy. Importantly, the main variable of interest, “Country’s techn. rel. dens.”, retains its significance and sign, confirming the robustness of our primary finding.

Next, we test the robustness of the persistence models. The results from the Poisson regression are shown first, followed by the Negative Binomial models, which can be more appropriate if the count data is over-dispersed.

**Effects on persisting specialisations - Poisson regression**

	Dependent variable:

	`No. of sustained ‘break-in’ spec.`		`No. of sustained ‘AI-specific’ spec.`
	(1)	(2)	(3)

`Country’s techn. rel. dens.`	0.039 (0.031)	0.010 (0.028)	0.021 (0.025)
`Interval`1979-1983	16.895 (2,728.392)	15.932 (2,842.308)	16.842 (2,853.177)
`Interval`1984-1988	16.295 (2,728.392)	15.805 (2,842.308)	17.421 (2,853.176)
`Interval`1989-1993	18.352 (2,728.392)	17.436 (2,842.308)	18.303 (2,853.176)
`Interval`1994-1998	18.704 (2,728.392)	17.984 (2,842.308)	18.490 (2,853.176)
`Interval`1999-2003	18.908 (2,728.392)	18.021 (2,842.308)	18.434 (2,853.176)
`Interval`2004-2008	18.676 (2,728.392)	17.928 (2,842.308)	18.688 (2,853.176)
`Interval`2009-2013	19.134 (2,728.392)	18.119 (2,842.308)	18.196 (2,853.176)
`Interval`2014-2018	18.911 (2,728.392)	18.078 (2,842.308)	18.282 (2,853.176)
`No. of ‘break-in’ spec.`	0.048 (0.086)		-0.054 (0.072)
`No. of ‘general’ spec.`	0.016 (0.091)		-0.102 (0.075)
`No. of ‘AI-specific’ spec.`	0.037 (0.062)		0.111^** (0.050)
`No. of sustained ‘break-in’ spec.`			0.166^*** (0.063)
`No. of sustained ‘general’ spec.`		0.072 (0.058)
`No. of sustained ‘AI-specific’ spec.`		0.106^*** (0.039)
Constant	-20.368 (2,728.392)	-18.733 (2,842.308)	-17.771 (2,853.177)

Observations	36	36	36
Log Likelihood	-47.003	-45.297	-53.420
Akaike Inf. Crit.	120.005	114.594	134.839

Note:	p<0.1; p<0.05; p<0.01

For the Negative binomial, the results look like this:

**Effects on persisting specialisations - Negative binomial regression**

	Dependent variable:

	`No. of sustained ‘break-in’ spec.`		`No. of sustained ‘AI-specific’ spec.`
	(1)	(2)	(3)

`Country’s techn. rel. dens.`	0.039 (0.031)	0.010 (0.028)	0.021 (0.025)
`Interval`1979-1983	17.895 (4,498.357)	16.932 (4,686.174)	17.842 (4,704.093)
`Interval`1984-1988	17.295 (4,498.357)	16.805 (4,686.174)	18.421 (4,704.093)
`Interval`1989-1993	19.352 (4,498.357)	18.436 (4,686.174)	19.303 (4,704.093)
`Interval`1994-1998	19.704 (4,498.357)	18.984 (4,686.173)	19.490 (4,704.093)
`Interval`1999-2003	19.908 (4,498.357)	19.021 (4,686.173)	19.434 (4,704.093)
`Interval`2004-2008	19.676 (4,498.357)	18.928 (4,686.173)	19.688 (4,704.093)
`Interval`2009-2013	20.134 (4,498.357)	19.119 (4,686.173)	19.196 (4,704.093)
`Interval`2014-2018	19.911 (4,498.357)	19.078 (4,686.173)	19.282 (4,704.093)
`No. of ‘break-in’ spec.`	0.048 (0.086)		-0.054 (0.072)
`No. of ‘general’ spec.`	0.016 (0.091)		-0.102 (0.075)
`No. of ‘AI-specific’ spec.`	0.037 (0.062)		0.111^** (0.050)
`No. of sustained ‘break-in’ spec.`			0.166^*** (0.063)
`No. of sustained ‘general’ spec.`		0.072 (0.058)
`No. of sustained ‘AI-specific’ spec.`		0.106^*** (0.039)
Constant	-21.368 (4,498.357)	-19.733 (4,686.174)	-18.771 (4,704.093)

Observations	36	36	36
Log Likelihood	-48.003	-46.298	-54.420
theta	46,856.060 (755,064.600)	45,122.010 (654,584.900)	52,915.220 (653,410.000)
Akaike Inf. Crit.	122.006	116.595	136.841

Note:	p<0.1; p<0.05; p<0.01

The results from both the Poisson and Negative Binomial regressions are nearly identical to each other and largely consistent with the OLS models. While some variables with marginal significance in the OLS models lose significance here, the main relationships of interest hold. This confirms that our findings regarding the persistence of specializations are not sensitive to the choice of a linear versus a count data model specification.

Markdown - Breaking in or Breaking Through? How Local Specialisations Shape the Integration of AI Technologies

Matheus E. Leusin

2024-09-12