💡 Welcome to the Methodological Appendix!

This file provides a fully transparent and reproducible account of the data analysis performed for the paper, ‘Breaking in or Breaking Through? How Local Specialisations Shape the Integration of AI Technologies’ (doi: https://doi.org/10.1080/10438599.2025.2558626). Its goal is to offer a clear roadmap of the methodological steps, from initial data sourcing to the final regression analysis.


This document details the complete analytical pipeline, structured into five key parts:

1. Calculating Technological Specializations (Section 1.1) ⚙️
This section describes how raw patent data is processed to calculate Revealed Technological Advantage (RTA) for different countries and for AI as a whole. This is performed across three distinct time intervals to capture technological evolution.
2. Constructing and Visualizing Technological Spaces (Sections 1.2 & 1.3) 🌐
Here, we explain the creation of the Global Technological Space (GTS), based on the co-occurrence of technological fields in patents, and the dynamic AI-specific Technological Space (ATS). This section also covers the visualization of national trajectories within these spaces.
3. Generating Supporting Figures (Section 2) 📊
This part outlines the code used to create the descriptive figures presented in the paper, such as the growth of AI patents and the share of specialized fields.
4. Robustness Checks & Permutation Analysis (Section 3) 🎲
This section details the permutation-based robustness checks, which are used to ensure that our findings are statistically significant and not the result of random chance.
5. Econometric Analysis (Section 4) 📈
Finally, we present the regression models used to formally test our hypotheses, including the data setup, model specifications, and interpretation of the results.

1. Technological Spaces based on Technological field

1.1. Calculate Specializations for Different Time intervals

This section details the foundational step of our analysis: calculating the technological specializations of countries and of AI itself. We begin by loading extensive patent datasets and defining several custom functions to streamline the process. The end goal is to compute Revealed Technological Advantage (RTA) scores for three distinct time intervals, which will later serve as the basis for constructing our technological spaces.

Data Loading and Initial Setup

We start by loading the necessary R libraries and defining a set of custom functions that will be used repeatedly for data aggregation and weighting. The primary dataset is a large file containing patent applications and their associated inventor locations, which is loaded in manageable chunks to optimize memory usage.

A preview of the primary patent data (ipc_all_patents_part2_df) is shown below. Each row represents a patent application (appln_id) linked to an inventor’s country (ctry_code) and a technological field (techn_field_nr).

kable(as.data.frame(ipc_all_patents_part2_df[1:6,]))
appln_id ctry_code techn_field_nr weight priority_year
203438 JP 2 9 2000
203438 JP 2 9 2000
203438 JP 9 1 2000
203438 JP 9 1 2000
203521 US 15 375 1996
203521 US 16 625 1996

Note that the original weight column from PATSTAT is disregarded in our analysis. We recalculate a fractional weight internally to ensure that each patent application has a total weight of 1, distributed equally among its assigned technological fields.

Next, we load two supplementary datasets:

  1. AI Patent Data (other_files/IPCs_AI.csv): A curated list of patent applications identified as being related to Artificial Intelligence.

  2. IPC Technology Names (other_files/ipc_technology.csv): A reference file that maps technological field numbers to their descriptive names and sectors.

The AI Patent Data (ai_patents_df) looks like this:

head(ai_patents_df)

The IPC Technology Names (ipc_names_df) looks like this:

kable(as.data.frame(ipc_names_df[1:6,]))
field_nr sector field_name techn_field_nr
1 Electrical engineering Electrical machinery, apparatus, energy 1
2 Electrical engineering Audio-visual technology 2
3 Electrical engineering Telecommunications 3
4 Electrical engineering Digital communication 4
5 Electrical engineering Basic communication processes 5
6 Electrical engineering Computer technology 6

Calculating Specializations for Interval 1 (1974-1988)

The core of this section involves calculating the specialization scores for our first time interval, 1974-1988. This process is repeated identically for the subsequent two intervals.

The first step is to filter the main patent dataset for the specified period. We then apply our custom functions, group_by_applnID() and group_by_ctry_and_techn_field(), to fractionally count patent activities.

The group_by_applnID() function assigns an equal weight to each technological field within a single patent. For instance, if a patent is classified under four fields, each field receives a weight of 0.25. The result is a weighted dataset:

kable(as.data.frame(region_tech_fields_1_df[1:6,]))
appln_id ctry_code techn_field_nr field_weight
206163 DE 1 0.50
206163 DE 1 0.50
214019 FR 9 0.25
214019 FR 9 0.25
214019 FR 29 0.25
214019 FR 29 0.25

Next, group_by_ctry_and_techn_field() aggregates these weights, summing them up for each country-technology pair. This yields the total fractional count of patents for each country in each technological field.

kable(as.data.frame(region_tech_fields_1_df[1:6,]))
ctry_code techn_field_nr n_tech_reg
AD 20 1
AD 24 1
AD 28 3
AD 32 1
AD 33 1
AD 34 3

This aggregated data, saved as reg_tech_FirstPeriod.csv, is transformed into a country-technology matrix. The matrix rows represent countries, columns represent technological fields, and the values are the fractional patent counts. It looks like this:

kable(as.matrix(mat_reg_tech1[1:20, 1:12]), caption = "Sample of the Country-technology matrix")
Sample of the Country-technology matrix
1 2 3 4 5 6 7 8 9 10 11 12
AG 1 0 0 0 0 0 0 0 0 0 0 0
AM 1 0 0 0 0 0 0 0 0 0 0 0
AR 11 7 6 1 2 2 0 0 3 12 1 6
AT 758 299 198 23 131 64 1 28 279 461 38 130
AU 1403 606 409 48 178 272 10 81 509 1321 149 659
BA 0 0 0 0 0 0 0 0 0 0 0 0
BB 0 0 0 0 0 0 0 0 0 1 0 0
BE 261 97 159 32 54 43 0 26 182 196 65 80
BG 992 335 166 36 333 455 0 114 228 1284 137 340
BI 1 0 0 0 0 0 0 0 0 0 0 0
BM 2 2 0 0 0 0 0 0 0 0 0 0
BO 2 0 0 0 0 0 0 0 0 1 0 1
BR 1759 647 754 47 138 325 12 50 296 962 48 851
BS 4 0 0 0 0 0 0 0 0 0 0 0
BU 0 0 0 0 0 0 0 0 0 1 0 0
CA 3827 1409 1474 254 553 784 14 372 1368 2915 252 1041
CH 2194 788 372 108 299 242 4 225 841 2526 120 754
CL 0 1 1 0 0 1 0 0 1 1 0 2
CN 931 233 146 20 106 317 0 132 306 994 45 223
CO 8 2 0 0 0 0 0 0 0 0 0 4

Finally, we use this matrix to calculate the Revealed Technological Advantage (RTA) for each country in each field. RTA is a non-binary index that measures whether a country has a greater share of patents in a specific technology compared to the global average. An RTA value greater than or equal to 1 indicates a specialization.

kable(as.data.frame(reg_RCA1_df[1:6,]))
ctry_code techn_field_nr RCA
AG 1 6.4097284
AM 1 12.8194569
AR 1 0.4209374
AT 1 0.7479332
AU 1 0.4791203
BA 1 0.0000000

This entire process is then repeated, this time using only the AI-related patents from the first interval to calculate AI-specific RTAs for each country. For the AI patents, the RTAs look like this:

kable(as.data.frame(reg_RCA1_AI_df[1:12,]))
ctry_code techn_field_nr RCA
AT 1 0
AT 10 0
AT 11 0
AT 12 0
AT 13 0
AT 17 0
AT 2 0
AT 20 0
AT 23 0
AT 24 0
AT 25 0
AT 26 0

The general and AI-specific RTA dataframes are then merged. The resulting file for the first interval shows, for each country and technological field, both its general specialization (RCA_Gen) and its AI-specific specialization (RCA_AI), as highlighted below for the whole dataset, and for Japan as an example.

#Resulting file:
kable(as.data.frame(rca_data_period_1_df[1:6,]))
ctry_code techn_field_nr RCA_Gen RCA_AI Period
AD 1 0 NA 1974-1988
AD 10 0 NA 1974-1988
AD 11 0 NA 1974-1988
AD 12 0 NA 1974-1988
AD 13 0 NA 1974-1988
AD 14 0 NA 1974-1988
#Example Japan:
kable(as.data.frame(rca_data_period_1_df[rca_data_period_1_df$ctry_code == "JP",][1:6,]))
ctry_code techn_field_nr RCA_Gen RCA_AI Period
2731 JP 1 1.1268685 1.4025974 1974-1988
2732 JP 10 1.0109760 1.2022263 1974-1988
2733 JP 11 0.7067914 0.0000000 1974-1988
2734 JP 12 1.0644547 1.0957792 1974-1988
2735 JP 13 0.6104061 0.7012987 1974-1988
2736 JP 14 0.6177233 NA 1974-1988

A key methodological step follows: we treat the entire corpus of AI patents as if it belonged to a single, hypothetical ‘country’ named AI_pat. This novel approach allows us to calculate the RTA for AI itself across all technological fields, providing a benchmark against which national specializations can be compared. The resulting data is saved for later use, and it looks like this:

kable(as.data.frame(region_tech_ai_1_df[region_tech_ai_1_df$ctry_code == "AI_pat",][1:6,]))
ctry_code techn_field_nr n_tech_reg
AI_pat 1 1.750000
AI_pat 2 2.250000
AI_pat 3 3.533333
AI_pat 4 2.750000
AI_pat 5 12.916667
AI_pat 6 279.433333

Consolidating Data Across All Intervals

The calculation process detailed above is repeated for the remaining two intervals: 1989-2003 and 2004-2018. After processing all periods, the three interval-specific RTA files are combined into a single, comprehensive dataset named IPC_RCAs that is saved for later usage (Files_created_with_the_code/data/files_code_Fields_analysis/IPC_RCAs.csv). This file contains the complete history of general and AI-specific specializations for all countries across the three time periods. Using Japan again as an example, the file looks like this for this country across each interval:

kable(as.data.frame(IPC_RCAs[IPC_RCAs$ctry_code == "JP" & IPC_RCAs$Period == "1974-1988",][1:6,]))
ctry_code techn_field_nr RCA_Gen RCA_AI Period
2731 JP 1 1.1268685 1.4025974 1974-1988
2732 JP 10 1.0109760 1.2022263 1974-1988
2733 JP 11 0.7067914 0.0000000 1974-1988
2734 JP 12 1.0644547 1.0957792 1974-1988
2735 JP 13 0.6104061 0.7012987 1974-1988
2736 JP 14 0.6177233 NA 1974-1988
kable(as.data.frame(IPC_RCAs[IPC_RCAs$ctry_code == "JP" & IPC_RCAs$Period == "1989-2003",][1:6,]))
ctry_code techn_field_nr RCA_Gen RCA_AI Period
9486 JP 1 1.1395992 0.9834465 1989-2003
9487 JP 10 1.0147327 0.8904603 1989-2003
9488 JP 11 0.6029495 0.6173858 1989-2003
9489 JP 12 1.0394907 0.9396071 1989-2003
9490 JP 13 0.5833007 0.5805270 1989-2003
9491 JP 14 0.6666774 1.8521575 1989-2003
kable(as.data.frame(IPC_RCAs[IPC_RCAs$ctry_code == "JP" & IPC_RCAs$Period == "2004-2018",][1:6,]))
ctry_code techn_field_nr RCA_Gen RCA_AI Period
18341 JP 1 1.2953810 1.3244132 2004-2018
18342 JP 10 0.8728225 0.9737969 2004-2018
18343 JP 11 0.5286471 0.9540950 2004-2018
18344 JP 12 0.8957888 1.1166269 2004-2018
18345 JP 13 0.8071723 1.3124353 2004-2018
18346 JP 14 0.5785004 3.3393324 2004-2018

To facilitate further analysis and visualization, we create consolidated summary files for each interval. These files combine the RTA scores of the four focus countries (US, CN, KR, JP) and the AI_pat entity into a single, wide-format table. They are named, e.g., as Files_created_with_the_code/data/files_code_Fields_analysis/Metrics_First_period.csv (for the First interval, and the names change accordingly for the Second and Third intervals). The data for this example interval looks like this:

kable(as.data.frame(First_period[1:6,]))
techn_field_nr sector field_name RCA_US RCA_CN RCA_KR RCA_JP RCA_AI
1 Electrical engineering Electrical machinery, apparatus, energy 0.8061303 0.7812853 0.8859104 1.126869 0.0593493
2 Electrical engineering Audio-visual technology 0.5290351 0.2812052 1.5817366 1.341886 0.0853538
3 Electrical engineering Telecommunications 0.6577273 0.3468912 1.4004752 1.230989 0.3360644
4 Electrical engineering Digital communication 0.7356689 0.2069789 1.3043447 1.244439 1.0978493
5 Electrical engineering Basic communication processes 0.8740121 0.3793871 1.0785703 1.191508 1.6452825
6 Electrical engineering Computer technology 0.6008344 0.5349367 0.9466936 1.371771 16.6484958

Finally, these three interval-specific summary files are merged into a master file named All_periods, shown below. This file includes additional labels for analytical purposes, though these are not central to the paper’s main findings.

head(IPC_names)

In the last step of this sub-section, we use the IPC_RCAs.csv file to generate a summary table (IPC_RCAs_Top4). Here, the non-binary RTA values are binarized, where any RTA ≥ 1 is considered a specialization (value of 1) and any RTA < 1 is not (value of 0). We then sum these binary indicators to count the number of general specializations, AI-specific specializations, and coinciding specializations (where a country is specialized in both the general field and its AI-specific application) for each country and interval, resulting in the following dataset:

kable(as.data.frame(IPC_RCAs_Top4[1:6,]))
ctry_code techn_field_nr RCA_Gen RCA_AI Period Label Round_general Round_AI Total_RCA
CN 1 0.7812853 0 1974-1988 Electrical machinery, apparatus, energy 0 0 0
CN 10 1.2306427 0 1974-1988 Measurement 1 0 1
CN 11 0.9483087 0 1974-1988 Analysis of biological materials 0 0 0
CN 12 0.7424070 0 1974-1988 Control 0 0 0
CN 13 1.3542371 0 1974-1988 Medical technology 1 0 1
CN 14 0.9327427 0 1974-1988 Organic fine chemistry 0 0 0

1.2. Building the Global Technological Space (GTS)

The next step is to construct the backbone of our analysis: the Global Technological Space (GTS). This space is a network where nodes represent technological fields, and the links between them signify their relatedness. We measure this relatedness based on the principle that technologies that frequently appear together within the same patent are likely to be related.

1.2.1. From Patents to a Co-occurrence Matrix

To quantify this relationship, we must first count how often every possible pair of technologies co-occurs across the entire patent dataset. We start by loading the complete patent database (which, due to its size, is again handled in chunks) and applying the create_sparse_matrix function. This function generates a very large matrix where rows are unique patents and columns are the 35 technological fields.

The resulting sparse matrix, mat_tech_AI1, indicates the presence of a technology in a given patent, and it looks like this:

kable(as.matrix(mat_tech_AI1[1:20, 1:12]), caption = "Sample of the Sparse AI matrix")
Sample of the Sparse AI matrix
1 2 3 4 5 6 7 8 9 10 11 12
58 0 0 0 0 0 2 0 0 0 0 0 0
76 0 0 0 0 0 0 0 0 0 0 0 0
111 0 0 0 0 0 0 0 0 0 0 0 0
139 0 0 0 0 0 0 0 0 0 0 0 4
151 0 0 0 0 0 0 0 0 0 0 0 0
159 0 0 0 0 0 0 0 0 0 0 0 0
183 0 0 0 0 0 0 0 0 0 0 0 0
193 0 0 0 0 0 0 0 0 0 0 0 0
200 0 0 0 0 0 0 0 0 0 0 0 0
206 0 0 0 0 0 0 0 0 0 0 0 0
217 0 0 0 0 0 0 0 0 0 0 0 0
218 0 0 0 0 0 0 0 0 0 0 0 0
220 0 0 0 0 1 0 0 0 0 0 0 0
231 1 0 0 0 0 0 0 0 0 0 0 0
243 3 0 0 0 0 0 0 0 0 0 0 0
246 0 0 0 0 0 0 0 0 0 0 0 0
261 0 0 0 0 0 0 0 0 0 0 0 0
266 0 0 0 0 0 0 0 0 0 0 0 0
280 0 0 0 0 0 0 0 0 0 0 0 0
283 0 0 0 0 0 0 0 0 0 0 0 0

By calculating the cross-product of this matrix (t(M) %*% M), we transform it into a 35x35 square co-occurrence matrix. Each cell (i, j) in this new matrix contains a count of how many patents simultaneously list technology i and technology j. This square matrix looks like this:

kable(as.matrix(mat_tech_AI1[1:35, 1:35]), caption = "Sample of the co-occurrence matrix")
Sample of the co-occurrence matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
5768289 180008 72869 35505 31634 105672 48931 293699 162448 224446 6043 98397 35221 26284 5711 2489 110241 2856 77403 250123 124491 58683 82108 29344 44158 57741 95078 21109 81035 73388 75597 213765 29184 26222 59692
180008 2531644 182176 197948 36765 435699 37718 187833 317265 91013 2012 90671 33259 5530 1220 688 28068 155 28650 18549 67392 7150 10257 9675 19330 25924 5935 26245 28330 22945 20173 54606 20783 33300 14997
72869 182176 2229334 679000 58436 355831 48692 13799 109871 138039 4223 95899 24034 622 435 139 1132 61 1243 2934 4601 2666 2396 1464 12865 2658 1948 76191 7767 6641 7014 28756 6655 13681 14208
35505 197948 679000 3345610 45697 596334 160911 2506 4448 81955 2157 110688 12585 912 776 93 117 77 108 95 731 166 688 973 4228 596 1913 1919 3500 3584 913 21166 8249 4925 7775
31634 36765 58436 45697 506310 70807 435 34596 4931 26343 380 12005 2645 672 189 39 25 9 222 1357 1170 2069 1471 442 393 448 914 819 399 535 583 2988 754 1838 743
105672 435699 355831 596334 70807 6260090 553534 95310 109728 269221 26784 283610 127652 6859 21330 4725 2902 1485 5442 10175 14978 3842 10562 7448 33521 12340 19023 80353 24244 12310 15613 70014 52055 38240 35601
48931 37718 48692 160911 435 553534 1701101 926 3984 40824 3310 142899 27438 840 1781 836 20 1255 640 518 466 50 2764 2572 15140 988 2243 4467 6398 2500 1003 15738 14459 5927 9893
293699 187833 13799 2506 34596 95310 926 2907011 220961 114839 2712 10402 12349 102013 2263 495 78495 128 184747 59457 199001 53804 45908 14057 22677 62331 10748 16159 34466 19714 5928 6698 1232 8552 7164
162448 317265 109871 4448 4931 109728 3984 220961 2596916 106349 2321 22262 45652 25607 3488 1208 103336 28 86394 25865 88924 25285 18939 5130 47088 21196 8940 82423 75161 5256 24813 19059 4992 13602 5614
224446 91013 138039 81955 26343 269221 40824 114839 106349 5893276 237979 213472 103998 43521 104810 29536 16762 5190 47689 31626 33031 50026 95527 35712 45455 40922 53426 18286 43759 27833 56718 138519 17664 22110 80910
6043 2012 4223 2157 380 26784 3310 2712 2321 237979 810267 8274 27172 48107 261182 96945 5074 7045 10954 5843 4142 9304 24371 6285 2166 1909 2287 2120 15562 1142 1111 2364 730 1079 8314
98397 90671 95899 110688 12005 283610 142899 10402 22262 213472 8274 2064137 43797 1902 1012 7803 1243 2340 5778 5974 3131 680 13263 16084 44920 20787 17820 8993 26462 23917 18569 132219 46615 26491 49748
35221 33259 24034 12585 2645 127652 27438 12349 45652 103998 27172 43797 2429739 16945 35668 125523 31251 7933 25452 19786 31515 8844 59686 46015 48773 14450 20017 22087 52776 21157 14402 19788 54425 44840 10065
26284 5530 622 912 672 6859 840 102013 25607 43521 48107 1902 16945 2867623 218485 880680 142222 72141 366020 46278 19938 9300 345426 30433 2091 4094 1825 14666 19353 2021 1537 1447 1383 10727 1578
5711 1220 435 776 189 21330 1781 2263 3488 104810 261182 1012 35668 218485 2548809 536552 39796 301160 132050 8297 5668 10986 43438 67992 2818 10729 2211 11092 67652 1160 1394 659 404 5087 1955
2489 688 139 93 39 4725 836 495 1208 29536 96945 7803 125523 880680 536552 3083488 65679 234071 85334 13210 6219 17217 24794 3463 2974 1429 1446 4794 34654 355 181 425 2645 4469 240
110241 28068 1132 117 25 2902 20 78495 103336 16762 5074 1243 31251 142222 39796 65679 2049722 20411 343204 68382 104006 14426 94806 28112 20540 9678 4330 88256 501786 2040 23407 36789 6147 22655 24444
2856 155 61 77 9 1485 1255 128 28 5190 7045 2340 7933 72141 301160 234071 20411 1898838 59224 3902 1959 774 26817 9471 15336 2731 264 2880 128187 4135 2318 268 11201 7095 750
77403 28650 1243 108 222 5442 640 184747 86394 47689 10954 5778 25452 366020 132050 85334 343204 59224 3281112 126796 133673 28416 223167 106935 15567 32978 13952 101627 217172 28914 20605 9411 7559 37134 77863
250123 18549 2934 95 1357 10175 518 59457 25865 31626 5843 5974 19786 46278 8297 13210 68382 3902 126796 3478542 211364 137164 225615 137648 8314 183631 36572 20432 114726 66431 32638 11411 2704 9178 81833
124491 67392 4601 731 1170 14978 466 199001 88924 33031 4142 3131 31515 19938 5668 6219 104006 1959 133673 211364 1789373 35420 95519 27346 49898 77804 25745 67269 135262 13007 32746 24564 9689 43825 44971
58683 7150 2666 166 2069 3842 50 53804 25285 50026 9304 680 8844 9300 10986 17217 14426 774 28416 137164 35420 405306 39145 5423 1106 4918 2601 10598 11263 1510 2071 616 149 1395 368
82108 10257 2396 688 1471 10562 2764 45908 18939 95527 24371 13263 59686 345426 43438 24794 94806 26817 223167 225615 95519 39145 2731402 479200 47416 48496 45189 78191 90207 51070 31742 18160 23655 36675 39360
29344 9675 1464 973 442 7448 2572 14057 5130 35712 6285 16084 46015 30433 67992 3463 28112 9471 106935 137648 27346 5423 479200 2069291 14231 24015 83658 8483 46258 84902 14969 28141 7024 8805 52309
44158 19330 12865 4228 393 33521 15140 22677 47088 45455 2166 44920 48773 2091 2818 2974 20540 15336 15567 8314 49898 1106 47416 14231 1786154 54803 10446 51364 57020 9815 60074 63146 39448 41268 47536
57741 25924 2658 596 448 12340 988 62331 21196 40922 1909 20787 14450 4094 10729 1429 9678 2731 32978 183631 77804 4918 48496 24015 54803 2329323 32652 15113 61081 22827 74156 34357 9904 14452 31784
95078 5935 1948 1913 914 19023 2243 10748 8940 53426 2287 17820 20017 1825 2211 1446 4330 264 13952 36572 25745 2601 45189 83658 10446 32652 1816529 2404 17697 74745 123929 100390 5100 7191 39369
21109 26245 76191 1919 819 80353 4467 16159 82423 18286 2120 8993 22087 14666 11092 4794 88256 2880 101627 20432 67269 10598 78191 8483 51364 15113 2404 1308730 42673 1881 8205 6575 13946 59257 7957
81035 28330 7767 3500 399 24244 6398 34466 75161 43759 15562 26462 52776 19353 67652 34654 501786 128187 217172 114726 135262 11263 90207 46258 57020 61081 17697 42673 3126645 15300 49310 69364 18062 29144 90968
73388 22945 6641 3584 535 12310 2500 19714 5256 27833 1142 23917 21157 2021 1160 355 2040 4135 28914 66431 13007 1510 51070 84902 9815 22827 74745 1881 15300 1463621 30728 33966 23370 41410 31058
75597 20173 7014 913 583 15613 1003 5928 24813 56718 1111 18569 14402 1537 1394 181 23407 2318 20605 32638 32746 2071 31742 14969 60074 74156 123929 8205 49310 30728 1831696 212337 19557 16086 103204
213765 54606 28756 21166 2988 70014 15738 6698 19059 138519 2364 132219 19788 1447 659 425 36789 268 9411 11411 24564 616 18160 28141 63146 34357 100390 6575 69364 33966 212337 2772775 40099 19890 113411
29184 20783 6655 8249 754 52055 14459 1232 4992 17664 730 46615 54425 1383 404 2645 6147 11201 7559 2704 9689 149 23655 7024 39448 9904 5100 13946 18062 23370 19557 40099 1236189 41595 36894
26222 33300 13681 4925 1838 38240 5927 8552 13602 22110 1079 26491 44840 10727 5087 4469 22655 7095 37134 9178 43825 1395 36675 8805 41268 14452 7191 59257 29144 41410 16086 19890 41595 1084525 19702
59692 14997 14208 7775 743 35601 9893 7164 5614 80910 8314 49748 10065 1578 1955 240 24444 750 77863 81833 44971 368 39360 52309 47536 31784 39369 7957 90968 31058 103204 113411 36894 19702 3560637

After processing all data chunks, the individual co-occurrence matrices are summed to create a final, comprehensive matrix, which is then saved as Matrix_IPC.csv. This file looks like this:

kable(as.matrix(mat_tech_AI_Final[1:35, 1:35]))
1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 4 5 6 7 8 9
1 26242833 1105050 36467 460300 170587 113955 20807 10445 533998 13709 443392 1119478 1469546 724137 219144 419139 145875 237008 327159 460192 142474 432784 362362 386632 400546 915437 146905 128408 278714 143077 202650 445672 189070 1432517 832079
10 1105050 24682955 1084761 1050741 563614 204173 409432 106582 77728 22386 201039 563726 174826 157456 173075 524854 184676 244039 229919 330778 114862 238130 613474 135633 274385 662630 82825 127036 366014 325019 169190 1223744 160542 686541 615023
11 36467 1084761 3525353 34621 131313 253957 1242197 563632 30836 30137 52245 8619 26056 18887 27249 149002 37931 12659 7065 16227 13331 91847 16588 7282 7332 10983 3331 6630 36342 8870 1162 111402 16033 12597 15757
12 460300 1050741 34621 8999453 164516 8290 6446 22004 7817 10141 23760 438005 38608 22339 3116 76178 72363 304054 176239 119692 64315 145060 454421 131213 126129 612923 219052 145379 230550 479100 71090 1314660 735780 73831 123855
13 170587 563614 131313 164516 10467208 97468 155323 563556 179377 44737 152775 198152 100743 153392 27220 325463 213020 217590 74190 94111 127019 252392 86787 92050 71485 88497 261890 209610 43541 46635 15821 492137 120184 50575 349238
14 113955 204173 253957 8290 97468 15116843 1152360 4615828 854888 337538 2235446 47780 231642 104417 31400 1755442 130941 15050 32237 13592 121893 105799 4913 13022 8727 7206 9877 61408 8633 4488 2653 32031 3834 365327 252442
15 20807 409432 1242197 6446 155323 1152360 10401074 2707298 181385 1089819 541154 6197 35052 22737 33307 221998 260273 11041 36376 11044 50116 273218 2593 6068 5164 3185 1754 20616 10697 3065 992 93726 7143 9801 17505
16 10445 106582 563632 22004 563556 4615828 2707298 13025114 316366 896097 390863 4455 67891 24447 56329 152120 17747 15281 8564 7183 22389 160764 2314 2210 3323 3009 11232 18056 1797 1814 290 20856 3661 3727 11026
17 533998 77728 30836 7817 179377 854888 181385 316366 10676610 93923 1944663 240044 362392 579773 46352 527703 125750 125085 56172 24044 488151 2274295 6304 11193 125667 157364 37219 102186 119196 1398 869 11786 1054 379736 640378
18 13709 22386 30137 10141 44737 337538 1089819 896097 93923 7317728 265045 1573 21057 15419 2554 153172 47742 85630 14007 2876 16674 531516 725 19936 5232 2287 52233 39290 2513 458 190 7036 4409 1018 1250
19 443392 201039 52245 23760 152775 2235446 541154 390863 1944663 265045 15422287 356341 685512 733594 101637 1108478 494356 87337 184865 88232 601362 1012771 9616 146983 126105 59265 42519 179676 364466 1428 1584 31035 2739 703190 591978
2 1119478 563726 8619 438005 198152 47780 6197 4455 240044 1573 356341 14335959 199276 506317 30510 68765 45202 147285 248275 31077 296926 232699 993581 90727 115040 239762 99615 210021 88928 798019 300865 2073525 166538 1132716 1778205
20 1469546 174826 26056 38608 100743 231642 35052 67891 362392 21057 685512 199276 16386118 1158401 438789 1165654 675668 57671 922532 232388 145835 662971 24974 463218 200267 71269 19752 51377 386575 895 7543 39318 5415 384304 204091
21 724137 157456 18887 22339 153392 104417 22737 24447 579773 15419 733594 506317 1158401 9278629 112480 555537 145430 295880 425365 135681 373035 890035 21894 90091 209204 139276 62699 239564 262458 1899 8101 55552 1756 1171768 496639
22 219144 173075 27249 3116 27220 31400 33307 56329 46352 2554 101637 30510 438789 112480 1324026 133465 21055 5819 16226 12957 36658 36419 10691 4051 9998 3684 315 4544 1693 584 6949 12485 156 184721 90594
23 419139 524854 149002 76178 325463 1755442 221998 152120 527703 153172 1108478 68765 1165654 555537 133465 13214230 2323343 276808 263347 265500 430127 548439 15323 302186 168414 92185 126170 174306 215646 4346 8945 52088 10773 282961 135042
24 145875 184676 37931 72363 213020 130941 260273 17747 125750 47742 494356 45202 675668 145430 21055 2323343 9199630 75744 138952 460881 58240 222304 13785 446346 85347 133339 49967 59730 271865 5944 5790 35730 19378 73581 39531
25 237008 244039 12659 304054 217590 15050 11041 15281 125085 85630 87337 147285 57671 295880 5819 276808 75744 9267812 360892 62498 397629 374961 93698 55575 293059 316170 190775 221657 262033 15775 2484 174082 109065 139063 447847
26 327159 229919 7065 176239 74190 32237 36376 8564 56172 14007 184865 248275 922532 425365 16226 263347 138952 360892 11407587 174310 109204 347920 16900 127266 387707 170519 52267 84352 187657 2672 4826 64361 21471 348827 140327
27 460192 330778 16227 119692 94111 13592 11044 7183 24044 2876 88232 31077 232388 135681 12957 265500 460881 62498 174310 9032042 15843 88766 13427 374929 659505 542172 27189 35329 178042 6855 7045 81588 11568 64121 60694
28 142474 114862 13331 64315 127019 121893 50116 22389 488151 16674 601362 296926 145835 373035 36658 430127 58240 397629 109204 15843 8187901 299707 477030 14857 59083 41079 76670 340332 45916 15540 6822 501207 28035 93685 702628
29 432784 238130 91847 145060 252392 105799 273218 160764 2274295 531516 1012771 232699 662971 890035 36419 548439 222304 374961 347920 88766 299707 15132833 39136 87262 321723 405485 114989 184052 470602 13139 3479 104021 33017 168451 462908
3 362362 613474 16588 454421 86787 4913 2593 2314 6304 725 9616 993581 24974 21894 10691 15323 13785 93698 16900 13427 477030 39136 10554590 37350 34597 153809 31671 79752 71338 2820312 406133 1706302 220822 112782 760568
30 386632 135633 7282 131213 92050 13022 6068 2210 11193 19936 146983 90727 463218 90091 4051 302186 446346 55575 127266 374929 14857 87262 37350 7500577 170350 167141 134233 209624 166601 13797 4839 53130 12660 101086 24403
31 400546 274385 7332 126129 71485 8727 5164 3323 125667 5232 126105 115040 200267 209204 9998 168414 85347 293059 387707 659505 59083 321723 34597 170350 8975847 1094346 95790 86359 577745 3925 5318 73536 3825 34116 156239
32 915437 662630 10983 612923 88497 7206 3185 3009 157364 2287 59265 239762 71269 139276 3684 92185 133339 316170 170519 542172 41079 405485 153809 167141 1094346 12649692 175913 120849 544994 93460 18938 275421 64206 32146 92514
33 146905 82825 3331 219052 261890 9877 1754 11232 37219 52233 42519 99615 19752 62699 315 126170 49967 190775 52267 27189 76670 114989 31671 134233 95790 175913 5758204 219012 209748 38328 3621 208961 55401 5833 22884
34 128408 127036 6630 145379 209610 61408 20616 18056 102186 39290 179676 210021 51377 239564 4544 174306 59730 221657 84352 35329 340332 184052 79752 209624 86359 120849 219012 5290424 120742 30035 22367 242998 39460 45044 97315
35 278714 366014 36342 230550 43541 8633 10697 1797 119196 2513 364466 88928 386575 262458 1693 215646 271865 262033 187657 178042 45916 470602 71338 166601 577745 544994 209748 120742 15805378 29060 4422 155283 59391 40477 32646
4 143077 325019 8870 479100 46635 4488 3065 1814 1398 458 1428 798019 895 1899 584 4346 5944 15775 2672 6855 15540 13139 2820312 13797 3925 93460 38328 30035 29060 12132499 261134 2358962 648691 11314 27124
5 202650 169190 1162 71090 15821 2653 992 290 869 190 1584 300865 7543 8101 6949 8945 5790 2484 4826 7045 6822 3479 406133 4839 5318 18938 3621 22367 4422 261134 2871031 435879 3267 212314 34443
6 445672 1223744 111402 1314660 492137 32031 93726 20856 11786 7036 31035 2073525 39318 55552 12485 52088 35730 174082 64361 81588 501207 104021 1706302 53130 73536 275421 208961 242998 155283 2358962 435879 25287889 2302961 544407 544095
7 189070 160542 16033 735780 120184 3834 7143 3661 1054 4409 2739 166538 5415 1756 156 10773 19378 109065 21471 11568 28035 33017 220822 12660 3825 64206 55401 39460 59391 648691 3267 2302961 6633202 7694 19252
8 1432517 686541 12597 73831 50575 365327 9801 3727 379736 1018 703190 1132716 384304 1171768 184721 282961 73581 139063 348827 64121 93685 168451 112782 101086 34116 32146 5833 45044 40477 11314 212314 544407 7694 13624917 1366521
9 832079 615023 15757 123855 349238 252442 17505 11026 640378 1250 591978 1778205 204091 496639 90594 135042 39531 447847 140327 60694 702628 462908 760568 24403 156239 92514 22884 97315 32646 27124 34443 544095 19252 1366521 15211979

1.2.2. Calculating Relatedness and Defining the Network

Raw co-occurrence counts can be misleading, as highly prevalent technologies will naturally co-occur more often with others, inflating their apparent relatedness. To correct for this, we normalize the matrix using the relatedness() function from the EconGeo package, which employs a cosine similarity index. The result is a relatedness matrix, where each value represents the strength of the relationship between two technologies. It looks like this:

kable(as.matrix(mat_tech_rel_AI[1:35, 1:35]))
1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 4 5 6 7 8 9
1 0.0000000 0.0799467 0.0047211 0.0428721 0.0184079 0.0082334 0.0018598 0.0008350 0.0429860 0.0018410 0.0311911 0.0820082 0.1179296 0.0604547 0.0418364 0.0305216 0.0143538 0.0260261 0.0358266 0.0557421 0.0148884 0.0333604 0.0305611 0.0501727 0.0429894 0.0878084 0.0227148 0.0169696 0.0303031 0.0131214 0.0355793 0.0294009 0.0220326 0.1174738 0.0653999
10 0.0799467 0.0000000 0.1470137 0.1024491 0.0636678 0.0154428 0.0383112 0.0089198 0.0065500 0.0031470 0.0148048 0.0432303 0.0146867 0.0137609 0.0345890 0.0400098 0.0190229 0.0280533 0.0263573 0.0419430 0.0125652 0.0192156 0.0541629 0.0184253 0.0308282 0.0665361 0.0134064 0.0175746 0.0416586 0.0312030 0.0310960 0.0845112 0.0195844 0.0589368 0.0506037
11 0.0047211 0.1470137 0.0000000 0.0060406 0.0265443 0.0343727 0.2079985 0.0844101 0.0046500 0.0075813 0.0068848 0.0011828 0.0039170 0.0029538 0.0097450 0.0203257 0.0069918 0.0026041 0.0014493 0.0036820 0.0026096 0.0132627 0.0026208 0.0017702 0.0014741 0.0019735 0.0009648 0.0016413 0.0074019 0.0015238 0.0003822 0.0137671 0.0035000 0.0019351 0.0023200
12 0.0428721 0.1024491 0.0060406 0.0000000 0.0239255 0.0008072 0.0007765 0.0023708 0.0008481 0.0018353 0.0022526 0.0432429 0.0041755 0.0025134 0.0008017 0.0074761 0.0095962 0.0449977 0.0260101 0.0195390 0.0090577 0.0150696 0.0516510 0.0229477 0.0182439 0.0792333 0.0456472 0.0258926 0.0337822 0.0592146 0.0168210 0.1168833 0.1155540 0.0081597 0.0131196
13 0.0184079 0.0636678 0.0265443 0.0239255 0.0000000 0.0109959 0.0216780 0.0703477 0.0225462 0.0093805 0.0167809 0.0226652 0.0126233 0.0199955 0.0081140 0.0370058 0.0327285 0.0373082 0.0126856 0.0177993 0.0207253 0.0303777 0.0114288 0.0186514 0.0119796 0.0132543 0.0632283 0.0432526 0.0073917 0.0066779 0.0043371 0.0506933 0.0218680 0.0064758 0.0428601
14 0.0082334 0.0154428 0.0343727 0.0008072 0.0109959 0.0000000 0.1076864 0.3857904 0.0719457 0.0473881 0.1644049 0.0036593 0.0194341 0.0091136 0.0062670 0.1336422 0.0134701 0.0017278 0.0036907 0.0017212 0.0133168 0.0085261 0.0004332 0.0017667 0.0009792 0.0007226 0.0015966 0.0084843 0.0009813 0.0004303 0.0004870 0.0022091 0.0004671 0.0313207 0.0207435
15 0.0018598 0.0383112 0.2079985 0.0007765 0.0216780 0.1076864 0.0000000 0.2799333 0.0188849 0.1892857 0.0492366 0.0005871 0.0036381 0.0024551 0.0082240 0.0209085 0.0331239 0.0015681 0.0051521 0.0017302 0.0067735 0.0272392 0.0002828 0.0010184 0.0007168 0.0003951 0.0003508 0.0035238 0.0015042 0.0003636 0.0002253 0.0079971 0.0010766 0.0010395 0.0017795
16 0.0008350 0.0089198 0.0844101 0.0023708 0.0703477 0.3857904 0.2799333 0.0000000 0.0294599 0.1392026 0.0318069 0.0003775 0.0063024 0.0023610 0.0124397 0.0128141 0.0020201 0.0019411 0.0010849 0.0010065 0.0027065 0.0143352 0.0002258 0.0003318 0.0004126 0.0003339 0.0020090 0.0027603 0.0002260 0.0001924 0.0000589 0.0015916 0.0004935 0.0003536 0.0010025
17 0.0429860 0.0065500 0.0046500 0.0008481 0.0225462 0.0719457 0.0188849 0.0294599 0.0000000 0.0146912 0.1593436 0.0204824 0.0338740 0.0563787 0.0103072 0.0447596 0.0144126 0.0159992 0.0071650 0.0033923 0.0594177 0.2041996 0.0006193 0.0016919 0.0157101 0.0175817 0.0067032 0.0157297 0.0150952 0.0001493 0.0001777 0.0009056 0.0001431 0.0362719 0.0586268
18 0.0018410 0.0031470 0.0075813 0.0018353 0.0093805 0.0473881 0.1892857 0.1392026 0.0146912 0.0000000 0.0362294 0.0002239 0.0032835 0.0025013 0.0009474 0.0216734 0.0091282 0.0182713 0.0029805 0.0006769 0.0033857 0.0796114 0.0001188 0.0050270 0.0010911 0.0004263 0.0156934 0.0100893 0.0005309 0.0000816 0.0000648 0.0009019 0.0009983 0.0001622 0.0001909
19 0.0311911 0.0148048 0.0068848 0.0022526 0.0167809 0.1644049 0.0492366 0.0318069 0.1593436 0.0362294 0.0000000 0.0265711 0.0559960 0.0623401 0.0197506 0.0821635 0.0495142 0.0097622 0.0206065 0.0108786 0.0639664 0.0794647 0.0008255 0.0194151 0.0137767 0.0057864 0.0066920 0.0241698 0.0403356 0.0001333 0.0002831 0.0020840 0.0003249 0.0586971 0.0473610
2 0.0820082 0.0432303 0.0011828 0.0432429 0.0226652 0.0036593 0.0005871 0.0003775 0.0204824 0.0002239 0.0265711 0.0000000 0.0169510 0.0448057 0.0061740 0.0053078 0.0047146 0.0171437 0.0288192 0.0039901 0.0328900 0.0190133 0.0888243 0.0124798 0.0130876 0.0243775 0.0163267 0.0294202 0.0102487 0.0775754 0.0559918 0.1449960 0.0205711 0.0984609 0.1481480
20 0.1179296 0.0146867 0.0039170 0.0041755 0.0126233 0.0194341 0.0036381 0.0063024 0.0338740 0.0032835 0.0559960 0.0169510 0.0000000 0.1122969 0.0972703 0.0985640 0.0772005 0.0073536 0.1173081 0.0326857 0.0176960 0.0593409 0.0024458 0.0697998 0.0249585 0.0079379 0.0035464 0.0078840 0.0488047 0.0000953 0.0015378 0.0030119 0.0007327 0.0365945 0.0186267
21 0.0604547 0.0137609 0.0029538 0.0025134 0.0199955 0.0091136 0.0024551 0.0023610 0.0563787 0.0025013 0.0623401 0.0448057 0.1122969 0.0000000 0.0259400 0.0488688 0.0172866 0.0392492 0.0562701 0.0198533 0.0470904 0.0828775 0.0022306 0.0141228 0.0271237 0.0161381 0.0117112 0.0382447 0.0344713 0.0002104 0.0017181 0.0044270 0.0002472 0.1160786 0.0471544
22 0.0418364 0.0345890 0.0097450 0.0008017 0.0081140 0.0062670 0.0082240 0.0124397 0.0103072 0.0009474 0.0197506 0.0061740 0.0972703 0.0259400 0.0000000 0.0268474 0.0057230 0.0017651 0.0049084 0.0043354 0.0105820 0.0077548 0.0024908 0.0014522 0.0029642 0.0009761 0.0001345 0.0016588 0.0005085 0.0001479 0.0033702 0.0022752 0.0000502 0.0418449 0.0196697
23 0.0305216 0.0400098 0.0203257 0.0074761 0.0370058 0.1336422 0.0209085 0.0128141 0.0447596 0.0216734 0.0821635 0.0053078 0.0985640 0.0488688 0.0268474 0.0000000 0.2408849 0.0320283 0.0303868 0.0338858 0.0473609 0.0445449 0.0013617 0.0413193 0.0190457 0.0093170 0.0205560 0.0242718 0.0247047 0.0004200 0.0016548 0.0036207 0.0013228 0.0244499 0.0111838
24 0.0143538 0.0190229 0.0069918 0.0095962 0.0327285 0.0134701 0.0331239 0.0020201 0.0144126 0.0091282 0.0495142 0.0047146 0.0772005 0.0172866 0.0057230 0.2408849 0.0000000 0.0118424 0.0216650 0.0794840 0.0086653 0.0243980 0.0016553 0.0824685 0.0130420 0.0182101 0.0110003 0.0112388 0.0420851 0.0007761 0.0014474 0.0033560 0.0032151 0.0085912 0.0044238
25 0.0260261 0.0280533 0.0026041 0.0449977 0.0373082 0.0017278 0.0015681 0.0019411 0.0159992 0.0182713 0.0097622 0.0171437 0.0073536 0.0392492 0.0017651 0.0320283 0.0118424 0.0000000 0.0627957 0.0120286 0.0660233 0.0459253 0.0125563 0.0114592 0.0499770 0.0481874 0.0468705 0.0465444 0.0452679 0.0022987 0.0006930 0.0182475 0.0201945 0.0181200 0.0559304
26 0.0358266 0.0263573 0.0014493 0.0260101 0.0126856 0.0036907 0.0051521 0.0010849 0.0071650 0.0029805 0.0206065 0.0288192 0.1173081 0.0562701 0.0049084 0.0303868 0.0216650 0.0627957 0.0000000 0.0334559 0.0180825 0.0424959 0.0022585 0.0261691 0.0659356 0.0259172 0.0128058 0.0176637 0.0323296 0.0003883 0.0013426 0.0067278 0.0039646 0.0453271 0.0174767
27 0.0557421 0.0419430 0.0036820 0.0195390 0.0177993 0.0017212 0.0017302 0.0010065 0.0033923 0.0006769 0.0108786 0.0039901 0.0326857 0.0198533 0.0043354 0.0338858 0.0794840 0.0120286 0.0334559 0.0000000 0.0029017 0.0119925 0.0019848 0.0852751 0.1240599 0.0911484 0.0073684 0.0081831 0.0339277 0.0011018 0.0021679 0.0094336 0.0023627 0.0092161 0.0083611
28 0.0148884 0.0125652 0.0026096 0.0090577 0.0207253 0.0133168 0.0067735 0.0027065 0.0594177 0.0033857 0.0639664 0.0328900 0.0176960 0.0470904 0.0105820 0.0473609 0.0086653 0.0660233 0.0180825 0.0029017 0.0000000 0.0349326 0.0608340 0.0029152 0.0095884 0.0059580 0.0179255 0.0680075 0.0075486 0.0021549 0.0018111 0.0499961 0.0049399 0.0116168 0.0835048
29 0.0333604 0.0192156 0.0132627 0.0150696 0.0303777 0.0085261 0.0272392 0.0143352 0.2041996 0.0796114 0.0794647 0.0190133 0.0593409 0.0828775 0.0077548 0.0445449 0.0243980 0.0459253 0.0424959 0.0119925 0.0349326 0.0000000 0.0036815 0.0126303 0.0385134 0.0433813 0.0198312 0.0271295 0.0570693 0.0013440 0.0006813 0.0076540 0.0042914 0.0154076 0.0405814
3 0.0305611 0.0541629 0.0026208 0.0516510 0.0114288 0.0004332 0.0002828 0.0002258 0.0006193 0.0001188 0.0008255 0.0888243 0.0024458 0.0022306 0.0024908 0.0013617 0.0016553 0.0125563 0.0022585 0.0019848 0.0608340 0.0036815 0.0000000 0.0059149 0.0045314 0.0180043 0.0059762 0.0128620 0.0094653 0.3156408 0.0870174 0.1373687 0.0314031 0.0112867 0.0729520
30 0.0501727 0.0184253 0.0017702 0.0229477 0.0186514 0.0017667 0.0010184 0.0003318 0.0016919 0.0050270 0.0194151 0.0124798 0.0697998 0.0141228 0.0014522 0.0413193 0.0824685 0.0114592 0.0261691 0.0852751 0.0029152 0.0126303 0.0059149 0.0000000 0.0343305 0.0301037 0.0389728 0.0520177 0.0340123 0.0023759 0.0015953 0.0065813 0.0027702 0.0155654 0.0036015
31 0.0429894 0.0308282 0.0014741 0.0182439 0.0119796 0.0009792 0.0007168 0.0004126 0.0157101 0.0010911 0.0137767 0.0130876 0.0249585 0.0271237 0.0029642 0.0190457 0.0130420 0.0499770 0.0659356 0.1240599 0.0095884 0.0385134 0.0045314 0.0343305 0.0000000 0.1630165 0.0230018 0.0177238 0.0975515 0.0005590 0.0014500 0.0075338 0.0006922 0.0043448 0.0190709
32 0.0878084 0.0665361 0.0019735 0.0792333 0.0132543 0.0007226 0.0003951 0.0003339 0.0175817 0.0004263 0.0057864 0.0243775 0.0079379 0.0161381 0.0009761 0.0093170 0.0182101 0.0481874 0.0259172 0.0911484 0.0059580 0.0433813 0.0180043 0.0301037 0.1630165 0.0000000 0.0377519 0.0221662 0.0822409 0.0118960 0.0046148 0.0252180 0.0103845 0.0036588 0.0100922
33 0.0227148 0.0134064 0.0009648 0.0456472 0.0632283 0.0015966 0.0003508 0.0020090 0.0067032 0.0156934 0.0066920 0.0163267 0.0035464 0.0117112 0.0001345 0.0205560 0.0110003 0.0468705 0.0128058 0.0073684 0.0179255 0.0198312 0.0059762 0.0389728 0.0230018 0.0377519 0.0000000 0.0647562 0.0510222 0.0078643 0.0014224 0.0308421 0.0144442 0.0010702 0.0040242
34 0.0169696 0.0175746 0.0016413 0.0258926 0.0432526 0.0084843 0.0035238 0.0027603 0.0157297 0.0100893 0.0241698 0.0294202 0.0078840 0.0382447 0.0016588 0.0242718 0.0112388 0.0465444 0.0176637 0.0081831 0.0680075 0.0271295 0.0128620 0.0520177 0.0177238 0.0221662 0.0647562 0.0000000 0.0251031 0.0052672 0.0075093 0.0306541 0.0087931 0.0070635 0.0146262
35 0.0303031 0.0416586 0.0074019 0.0337822 0.0073917 0.0009813 0.0015042 0.0002260 0.0150952 0.0005309 0.0403356 0.0102487 0.0488047 0.0344713 0.0005085 0.0247047 0.0420851 0.0452679 0.0323296 0.0339277 0.0075486 0.0570693 0.0094653 0.0340123 0.0975515 0.0822409 0.0510222 0.0251031 0.0000000 0.0041927 0.0012214 0.0161160 0.0108881 0.0052220 0.0040367
4 0.0131214 0.0312030 0.0015238 0.0592146 0.0066779 0.0004303 0.0003636 0.0001924 0.0001493 0.0000816 0.0001333 0.0775754 0.0000953 0.0002104 0.0001479 0.0004200 0.0007761 0.0022987 0.0003883 0.0011018 0.0021549 0.0013440 0.3156408 0.0023759 0.0005590 0.0118960 0.0078643 0.0052672 0.0041927 0.0000000 0.0608392 0.2065071 0.1003113 0.0012312 0.0028290
5 0.0355793 0.0310960 0.0003822 0.0168210 0.0043371 0.0004870 0.0002253 0.0000589 0.0001777 0.0000648 0.0002831 0.0559918 0.0015378 0.0017181 0.0033702 0.0016548 0.0014474 0.0006930 0.0013426 0.0021679 0.0018111 0.0006813 0.0870174 0.0015953 0.0014500 0.0046148 0.0014224 0.0075093 0.0012214 0.0608392 0.0000000 0.0730503 0.0009672 0.0442314 0.0068774
6 0.0294009 0.0845112 0.0137671 0.1168833 0.0506933 0.0022091 0.0079971 0.0015916 0.0009056 0.0009019 0.0020840 0.1449960 0.0030119 0.0044270 0.0022752 0.0036207 0.0033560 0.0182475 0.0067278 0.0094336 0.0499961 0.0076540 0.1373687 0.0065813 0.0075338 0.0252180 0.0308421 0.0306541 0.0161160 0.2065071 0.0730503 0.0000000 0.2561738 0.0426157 0.0408218
7 0.0220326 0.0195844 0.0035000 0.1155540 0.0218680 0.0004671 0.0010766 0.0004935 0.0001431 0.0009983 0.0003249 0.0205711 0.0007327 0.0002472 0.0000502 0.0013228 0.0032151 0.0201945 0.0039646 0.0023627 0.0049399 0.0042914 0.0314031 0.0027702 0.0006922 0.0103845 0.0144442 0.0087931 0.0108881 0.1003113 0.0009672 0.2561738 0.0000000 0.0010639 0.0025515
8 0.1174738 0.0589368 0.0019351 0.0081597 0.0064758 0.0313207 0.0010395 0.0003536 0.0362719 0.0001622 0.0586971 0.0984609 0.0365945 0.1160786 0.0418449 0.0244499 0.0085912 0.0181200 0.0453271 0.0092161 0.0116168 0.0154076 0.0112867 0.0155654 0.0043448 0.0036588 0.0010702 0.0070635 0.0052220 0.0012312 0.0442314 0.0426157 0.0010639 0.0000000 0.1274471
9 0.0653999 0.0506037 0.0023200 0.0131196 0.0428601 0.0207435 0.0017795 0.0010025 0.0586268 0.0001909 0.0473610 0.1481480 0.0186267 0.0471544 0.0196697 0.0111838 0.0044238 0.0559304 0.0174767 0.0083611 0.0835048 0.0405814 0.0729520 0.0036015 0.0190709 0.0100922 0.0040242 0.0146262 0.0040367 0.0028290 0.0068774 0.0408218 0.0025515 0.1274471 0.0000000

With the relatedness matrix complete, we can now treat it as an adjacency matrix to build a network graph (g_tech_AI). The nodes’ centrality (Eigenvector centrality) is calculated to determine their importance in the network. For visual clarity in later plots, links with below-average weight (relatedness) are filtered out. Finally, a Fruchterman-Reingold layout algorithm is applied to determine the spatial coordinates (coords_tech_AI) of each node for visualization, which results in the following coordinates:

kable(as.data.frame(coords_tech_AI[1:10,]))
x y
135.6454 65.62816
134.0933 64.22519
135.2521 58.81728
131.3472 67.12986
135.1214 62.67884
138.8334 59.25243
137.6120 58.32931
138.0070 57.48352
136.7269 61.15738
139.9433 57.74551

1.2.3. Preparing Data for Visualization

In the final step of this section, we prepare the specialization data (calculated in Section 1.1) for plotting onto the GTS. We load the summary file (RCA_4countries_detailed.csv) and create a new categorical variable (Var1) that classifies each country-technology pair into one of four states: no specialization (0), general specialization (1), AI-specific (break-through) specialization (2), or coinciding (break-in) specialization (3). This will allow us to map the countries’ technological trajectories directly onto the GTS structure in the next section. The dataset looks like this:

kable(as.data.frame(Newtable[1:10,]))
Var1 Var2 Var3 Freq
No specialization CN 1974-1988 19
General specialization CN 1974-1988 16
AI-specific specialization CN 1974-1988 0
Coinciding specialization CN 1974-1988 0
No specialization JP 1974-1988 12
General specialization JP 1974-1988 7
AI-specific specialization JP 1974-1988 6
Coinciding specialization JP 1974-1988 10
No specialization KR 1974-1988 21
General specialization KR 1974-1988 14

1.3. Plotting technological spaces

Now that the underlying data and network structures are in place, this section focuses on their visualization. We will generate the key plots presented in the paper, illustrating both the static, global structure of technology and the dynamic, evolving space of AI.

1.3.1. Global technological space (GTS)

We begin by plotting the fundamental structure of the Global Technological Space. This initial visualization is geography-agnostic, meaning it shows the inherent relatedness between technological fields without any country-specific data. The node size corresponds to its centrality (degree), and nodes are clustered and colored by their broader technological sector. This plot serves as the canvas upon which we will later map national trajectories.

  g_tech_AI %>%  ggraph(layout =  coords_tech_AI) + 
  geom_edge_link(aes(width = weight), alpha = 0.4, colour = "grey") + 
  geom_node_point(aes(fill = sector, size = 1000^dgr, shape= sector))+ # 
  scale_shape_manual(values=c(21, 22, 23, 24, 25)) + scale_size("Degree", range = c(2, 12)) + 
  geom_node_text(aes(label = paste0(field_name, "\n(", name, ")")), size = 4, repel = TRUE) +  #field_name or name
  theme_graph(base_family = "sans")+  ggtitle("Global technological space: IPC Technological fields") + 
  theme(legend.title = element_text(size = 14), legend.text = element_text(size = 10)) + 
  guides(colour = guide_legend(override.aes = list(size=10)))+
  geom_mark_hull(aes(x = x, y=y, colour = sector, fill= sector,
                     linetype = sector), alpha = 0.15, expand = unit(2.5, "mm"), size = 1) 

Next, we overlay the country-specific specialization data onto the static GTS canvas. This allows us to visualize the technological trajectory of each country over the three time intervals. The shape of each node indicates the type of specialization (general, break-through, or break-in), while hulls are drawn to highlight the clusters of specialization for each period. This composite visualization reveals how each nation’s technological focus has evolved within the global structure. Additionally, an horizontal bar plot is also generated to summarize the main indicators based on the country-specific specializations. Picking China as an example, this overlaid visualization and its linked bar-plot look like this:

#GTS with specialisations per country
country_select <- c("CN", "US", "JP", "KR")
### 1.2.3.3. Third Country
i=1
IPC_RCAs_wide_simplified <- IPC_RCAs_Top4 %>% pivot_wider(id_cols = c(ctry_code, techn_field_nr, Label), 
    names_from = Period_sim,
    values_from = c(RCA_AI_Period, Total_RCA_2, RCA_Gen, RCA_AI, Round_general, Round_AI, Total_RCA), 
    names_glue = "{.value}_Period_{Period_sim}" )

  g_tech_AI %N>% left_join(IPC_RCAs_wide_simplified %>%
                             filter(ctry_code == country_select[i]) %>%
                             select(-ctry_code), by = c("name" = "techn_field_nr")) %>%
  mutate(Shape_Group_P1_Factor = factor(
    ifelse(is.na(Total_RCA_2_Period_1), "NA_Value", as.character(Total_RCA_2_Period_1)),
    levels = c("0", "1", "2", "3", "NA_Value"))) %>% ggraph(layout = coords_tech_AI) +
  geom_edge_link(aes(width = weight), alpha = 0.2, colour = "#CCCCCC", show.legend = FALSE) + 
  geom_node_point(aes(shape = Shape_Group_P1_Factor, 
                      size = 5, stroke = ifelse(Total_RCA_2_Period_1 == 3, 2.5, 1.3),
                      alpha = 1), color = "#FF3300", show.legend = c(shape=TRUE, size=FALSE, stroke=FALSE, alpha=FALSE, color=FALSE)) + 
  geom_node_point(aes(shape = factor(Total_RCA_2_Period_2),
                      size = 5.5, stroke = ifelse(Total_RCA_2_Period_2 == 3, 2.5, 1.3),
                      alpha = 1), color = "#3399FF", show.legend = FALSE) +
  geom_node_point(aes(shape = factor(Total_RCA_2_Period_3), 
                      size = 6.5,stroke = ifelse(Total_RCA_2_Period_3 == 3, 2.5, 1.3),
                      alpha = 1), color = "#009900", show.legend = FALSE) +
  scale_shape_manual(name = "Type of specialisation",
                     values = c("0" = 4, "1" = 1, "2" = 5, "3" = 2, "NA_Value" = 16), breaks = c("0", "1", "2", "3"),                                
                     labels = c("0" = "No specialisation", "1" = "General specialisation", 
                                "2" = "Break-through specialisation", "3" = "Break-in specialisation"), 
                     na.translate = FALSE, drop = FALSE) + scale_size("Degree", range = c(7, 18))+ 
  scale_alpha(guide = "none") + 
  #geom_node_label(aes(label = name), size = 2, repel = F) + 
  geom_mark_hull(aes(filter = Total_RCA_2_Period_1 > .99, x = x, y = y, fill = "Period 1", group = "Period 1"), 
                 concavity = .1, alpha = .11, linetype = "dotted",expand = unit(2, "mm"), size = .5, color = "#FF3300") + 
  geom_mark_hull(aes(filter = Total_RCA_2_Period_2 > .99, x = x, y = y, fill = "Period 2", group = "Period 2"),
                 concavity = .1, alpha = .11, linetype = "longdash",expand = unit(2, "mm"), size = .5, color = "#3399FF") +
  geom_mark_hull(aes(filter = Total_RCA_2_Period_3 > .99, x = x, y = y, fill = "Period 3", group = "Period 3"),
                 concavity = .1, alpha = .02, expand = unit(2, "mm"), size = 1, color = "#009900") +
  scale_fill_manual(name = "Interval colour (same for \nboth nodes and cluster)", # New legend for fill
                    values = c("Period 1" = "#FF3300", "Period 2" = "#3399FF", "Period 3" = "#009900"),
                    labels = c("Interval 1 (1974-1988)", "Interval 2 (1989-2003)", "Interval 3 (2004-2018)")) +
  theme_graph(base_family = "sans") +  theme(legend.position = "bottom", #right
                                             legend.box = "vertical", legend.title = element_text(size = 12, face = "bold"), 
                                             legend.text = element_text(size = 10), legend.key.size = unit(0.7, "cm") ) +
  ggtitle("d) Global technological space: China (1974-2018)") +
  geom_node_text(aes(label = name), size = 5, repel = TRUE) +  #field_name or name
  guides(shape = guide_legend(title.position = "top", 
                              override.aes = list(size = 5, stroke = 1.5, color = "black") ),
         colour = guide_legend(title.position = "top", 
                               override.aes = list(linetype = c("solid", "longdash", "dotted"), 
                                                   alpha = 1, size = 1, shape = NA) ))

  bar_plot_China <- bar_plot_China <- IPC_RCAs_Top4[IPC_RCAs_Top4$ctry_code == country_select[i],] %>%                                   
  arrange(Label, Period) %>%  group_by(Label) %>%                          
  mutate( general = Total_RCA_2 == 1,    
          break_in              = Total_RCA_2 == 2,            
          break_through         = Total_RCA_2 == 3,            
          sustained_general    = general  & lag(general, 1, default = FALSE),
          sustained_break_in    = break_in  & lag(break_in, 1, default = FALSE),
          sustained_break_through    = break_through & lag(break_through, 1, default = FALSE)) %>% 
  ungroup()

bar_plot_China <- bar_plot_China %>% 
  group_by(Period) %>% summarise(`General case`                 = sum(general,           na.rm = TRUE),
                                 `Break-through case`                 = sum(break_in,           na.rm = TRUE),
                                 `Break-in case`            = sum(break_through,      na.rm = TRUE),
                                 `Sustained General case`       = sum(sustained_general, na.rm = TRUE),
                                 `Sustained break-through case`       = sum(sustained_break_in, na.rm = TRUE),
                                 `Sustained break-in case`  = sum(sustained_break_through, na.rm = TRUE),
                                 .groups = "drop") %>% arrange(Period)

plot_long_China <- bar_plot_China |>  rename(Period = Period) |>
  pivot_longer(cols= -Period,names_to= "Indicator",values_to = "Count")

#order labels
plot_long_China$Indicator <- factor(plot_long_China$Indicator, levels = rev(c("General case", "Break-through case",  "Break-in case", 
                                                              "Sustained General case", "Sustained break-through case", "Sustained break-in case")))
plot_long_China$Period <- factor(plot_long_China$Period, levels = c("2004-2018", "1989-2003", "1974-1988"))

legend_order <- c(
  "General case", "Break-through case", "Break-in case",
  "Sustained General case", "Sustained break-through case", "Sustained break-in case"
)

  ggplot(plot_long_China, aes(x = factor(Period),y = Count, fill = Indicator)) +
  geom_col(position = position_dodge(width = .8), width = .7) +
  scale_fill_manual(values = c("General case"  = "#FF3300",
    "Sustained General case"  = "#993333",
    "Break-in case"                 = "#009900", #3399FF
    "Sustained break-in case"       = "#006633", #3333CC
    "Break-through case"            = "#3399FF",  #009900
    "Sustained break-through case"  = "#3333CC"),
    breaks = legend_order) + #006633
  guides(fill = guide_legend(nrow = 2, byrow = TRUE)) +
  labs(x = "Interval",y = "Number of cases", fill = NULL, title = NULL)+
  ggtitle("Summary of specialisations China") +
  theme_classic(base_size = 11) + theme(legend.position = "bottom")+ coord_flip()

The plotting code is structured to iterate through each of the four focus countries by changing the i variable. The resulting figures, each depicting a single country’s trajectory over three periods alongside a summary bar chart, are then saved.

1.3.2. AI-specific technological space (ATS)

Unlike the static GTS, the AI-specific Technological Space (ATS) is dynamic. Its structure is recalculated for each time interval, reflecting the rapid evolution of AI technology. Here, the relatedness between fields is based only on their co-occurrence within AI patents for that specific period. This approach allows us to observe which technological fields form the core of AI innovation at different points in time.

Starting with the first interval (1974-1988), the top 10 most central technological fields in the AI space are:

g_tech_AI %N>%   arrange(desc(dgr)) %>%  as_tibble() %>%  slice(1:10)

We use the previously calculated AI specialization data (AI_RCA) to highlight the core technologies in each period. A binary flag indicates whether AI has an RTA ≥ 1 in a given field (and Period_sim refers to each interval, going from 1 to 3), like this:

kable(as.data.frame(AI_RCA[1:6,]))
techn_field_nr RCA_AI_Period Period_sim Binary
1 0.0593493 1 0
2 0.0853538 1 0
3 0.3360644 1 0
4 1.0978493 1 1
5 1.6452825 1 1
6 16.6484958 1 1

The following code generates the ATS for the first interval (1974-1988). The nodes with labels are those where AI is specialized (RTA ≥ 1).

AI_RCA1 <- AI_RCA[AI_RCA$Period_sim == 1,]
p=1
  g_tech_AI %N>%
  left_join(AI_RCA1 %>% filter(Period_sim == p), by = c("name" = "techn_field_nr")) %>%
  ggraph(layout = coords_tech_AI) + 
  geom_edge_link(aes(width = weight), alpha = 0.2, colour = "#CCCCCC") +
  geom_node_point(aes(fill = sector, size = 1000^dgr, shape= sector)) +
  scale_shape_manual(values=c(21, 22, 23, 24, 25)) + labs(color   = "RCA")+ scale_size("Degree", range = c(2, 12)) +
  geom_node_text(aes(filter=Binary > .99, label = field_name), size = 6, repel = TRUE) +
  theme_graph(base_family = "sans") + guides(colour = guide_legend(override.aes = list(size=5)))+
  ggtitle("AI-specific technological space (1974-1988)") #

We do the same for the 2 other intervals, and combine the three figures again using the multiplot custom function. The resulting figure is saved at Files_created_with_the_code/figures/Figure_2_ATS_and_AI_core_technologies_3_intervals.jpg.

2. Generating Descriptive Figures

This section details the creation of the paper’s descriptive figures. These visualizations illustrate key trends in AI patenting and the evolution of national specialization strategies that motivate our main analysis.

2.1. Share of Break-in specialisations (Fig 6 and 7)

Here, we generate the plots showing the share of ‘break-in’ specializations for each country over time. This metric is central to our paper’s narrative and is calculated as the ratio of coinciding specializations (specialized in both the general field and its AI application) to the country’s total number of general specializations. A higher share indicates that a larger portion of a country’s established technological strengths is being integrated with AI. We first perform this analysis at the technological field level.

The data is processed to count the number of ‘coinciding’, ‘general only’, and ‘AI only’ specializations for each country and period. From these counts, the Share_coinciding is calculated. The resulting summary table is shown below.

SummaryAllData<-distinct(IPC_RCAs, ctry_code, Period, .keep_all = TRUE) 
colnames(SummaryAllData)[1] <- "Country"
head(SummaryAllData)

This summarized data is then used to plot the evolution of the break-in share for the four focus countries (Figure 6).

  ggplot(data=SummaryAllData, aes(x=Period, y=Share_coinciding, group=Country, shape = Country, color=Country)) +
  geom_point(aes(fill = Country), size=8) +   scale_shape_manual(values=c(21, 22, 24, 23)) +
  xlab("Interval") +  ylab("Share of break-in specialisations (%)") +
  theme_classic() +  geom_line(aes(color=Country), linetype = "dashed", size=1.5)+
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) +
  scale_color_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A"))

To ensure the robustness of our findings, we repeat the analysis at a more granular level of technological classification: the 4-digit IPC subclass. This serves as a check to confirm that the observed trends are not an artifact of the broader 35-field aggregation.

The resulting plot (Figure 7) confirms that the trends observed at the field level are consistent at the more detailed subclass level.

  ggplot(data=SummaryAllData4dig, aes(x=Period, y=Share_coinciding, group=Country, shape = Country, color=Country)) +
  geom_point(aes(fill = Country), size=8) + 
  scale_shape_manual(values=c(21, 22, 24, 23)) +
  xlab("Interval") +
  ylab("Share of break-in specialisations (%)") +
  theme_classic() +
  geom_line(aes(color=Country), linetype = "dashed", size=1.5)+
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) +
  scale_color_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")) 

2.2. Growth of AI Patents (Fig 1)

This section reproduces Figure 1 from the paper, which illustrates the dramatic growth in AI patenting since the 1970s. We use the raw AI patent data, aggregating the number of unique patent applications per country for each year. A log-10 scale is used for the y-axis to accommodate the exponential increase in patent counts and allow for a clearer comparison of growth trajectories between the four focus countries, resulting in the figure seen below.

  ggplot(data=test, aes(x=Year, y=log10(Number_of_AI_patents), group=Country, colour=Country, shape=Country)) +
  geom_line(size=1.2, aes(linetype=Country)) +
  geom_point(size=4) +  xlab("Year") +  ylab("Number of new AI registers [Log10]") + theme_classic() +
  scale_linetype_manual(values=c("twodash", "longdash", "solid", "solid")) +
  scale_shape_manual(values=c(16, 15, 17, 18)) + theme(legend.position="bottom") +
  theme(text = element_text(size = 15)) +  scale_y_continuous(limits=c(0,4)) + 
  geom_vline(data=test, aes(xintercept=c(1988),  colour=Period), linetype="dashed", size=1, color = "grey") +  
  geom_vline(data=test, aes(xintercept=c(2003),  colour=Period), linetype="dashed", size=1, color = "grey") +  
  scale_x_continuous(breaks = c(1974, 1988, 2003, 2018), limits=c(1974, 2018)) + scale_color_brewer(palette="Dark2") + 
  annotate("rect", xmin = 1974, xmax=1988, ymin = 3.6, ymax = 4, alpha = .01, color = "black") +
  annotate("text", x = 1981, y = 3.8, label = c("First Interval \n(1974-1988)"), size=4)+
  annotate("rect", xmin = 1988, xmax=2003, ymin = 3.6, ymax = 4, alpha = .01, color = "black") +
  annotate("text", x = 1996, y = 3.8, label = c("Second Interval \n(1989-2003)"), size=4) +
  annotate("rect", xmin = 2003, xmax=2018, ymin = 3.6, ymax = 4, alpha = .01, color = "black") +
  annotate("text", x = 2011, y = 3.8, label = c("Third Interval \n(2004-2018)"), size=4)

3. Robustness Checks: Permutation Analysis

To ensure that our findings are statistically robust and not merely the result of random chance, we conduct a permutation analysis. The core idea is to create a “null model” by generating thousands of randomized AI patent datasets. By comparing our actual results to the distribution of results from these random datasets, we can assess the statistical significance of our observations. This section details the creation of these permuted datasets and the subsequent recalculation of specialization metrics.

3.1. Permutate the AI dataset

The first step is to generate the randomized, or permuted, datasets. For each of the four focus countries and for each time interval, we follow a specific procedure:

  1. Count the number of actual AI patents the country has in that interval.
  2. Randomly select the same number of patents from that country’s entire pool of patents (both AI and non-AI) for that interval.
  3. Treat this random sample as the new, ‘permuted’ AI dataset for that country.

This process is repeated 1,000 times (note: num_permutations is set to 10 in this example for faster execution) to create 1,000 counterfactual scenarios where ‘AI’ patents are just random draws from a country’s overall technological portfolio.

We begin by reloading the patent data for the first interval (1974-1988) to establish the pool from which random patents will be drawn.

The resulting count of technological fields per country is:

kable(as.data.frame(region_tech_fields_1_df[1:6,]))
ctry_code techn_field_nr n_tech_reg
AD 20 1
AD 24 1
AD 28 3
AD 32 1
AD 33 1
AD 34 3

The following loop executes the permutation logic. For each of the 10 iterations, it samples a new set of random ‘AI’ patents for the target countries.

list_of_permuted_dfs <- vector("list", length = num_permutations)

for (p in 1:num_permutations) {
  if (p %% 100 == 0) print(paste("Permutation number:", p)) # Progress indicator
  
  # This dataframe will hold the permuted AI patents for target countries ONLY for THIS iteration
  permuted_ai_for_target_countries_iter <- data.frame()
  
  for (country in target_countries) {
    # 1. Identify and Count ACTUAL AI patents for the current country from the original AI dataset
    actual_ai_appln_ids_country <- ai_patents_period_1_df %>%
      filter(ctry_code == country) %>%
      distinct(appln_id) %>%
      pull(appln_id)
    
    n_ai_country <- length(actual_ai_appln_ids_country)
    
    if (n_ai_country == 0) {
      # print(paste("No AI patents found for", country, "in original AI data. Skipping for perm", p))
      next # Skip to the next country if no AI patents to replace
    }
    
    # 2. Prepare the pool of ALL patents for the current country from the general dataset
    country_all_patents_pool <- ipc_all_patents_first_period_df %>%
      filter(ctry_code == country) %>%
      distinct(appln_id)
    
    if (nrow(country_all_patents_pool) == 0) {
      # print(paste("No patents in general pool for", country, ". Skipping for perm", p))
      next
    }
    
    # Handle cases where the pool is smaller than the number of AI patents to sample
    # This is unlikely if ipc_all_patents_first_period_df is complete, but good for robustness
    sample_size <- min(n_ai_country, nrow(country_all_patents_pool))
    replace_sampling <- FALSE
    if (n_ai_country > nrow(country_all_patents_pool)) {
      # print(paste("Warning: For country", country, "in perm", p,
      #             "not enough unique patents in pool. Sampling", nrow(country_all_patents_pool),
      #             "instead of", n_ai_country, "OR consider sampling with replacement."))
      # Decide: either sample fewer (as done with min()), or sample with replacement.
      # If sampling with replacement is desired:
      # sample_size <- n_ai_country
      # replace_sampling <- TRUE
      # For now, we sample up to the available pool size without replacement.
      # Or, if strict adherence to n_ai_country is needed and pool is too small WITH replace=FALSE:
      if(nrow(country_all_patents_pool) < n_ai_country && !replace_sampling){
        # print(paste("Strict N_AI needed, but pool too small for", country, "in perm", p, ". Skipping country for this perm."))
        next # Skip this country for this permutation if not enough patents
      }
    }
    
    
    # 3. Randomly select an equivalent number of unique appln_ids from this country's general pool
    random_appln_ids_country <- sample(country_all_patents_pool$appln_id,
                                       size = sample_size, # Use adjusted sample_size
                                       replace = replace_sampling) # Use replace_sampling flag
    
    # 4. Get all rows for these randomly selected patents from the ipc_all_patents_first_period_df
    randomly_selected_patents_df_country <- ipc_all_patents_first_period_df %>%
      filter(appln_id %in% random_appln_ids_country & ctry_code == country)
    
    # 5. Add these randomly selected patents for the current country to the iteration's df
    if (nrow(randomly_selected_patents_df_country) > 0) {
      permuted_ai_for_target_countries_iter <- bind_rows(
        permuted_ai_for_target_countries_iter,
        randomly_selected_patents_df_country
      )
    }
  } # End of country loop
  
  # Add the permutation number to all rows of this iteration's dataframe
  if (nrow(permuted_ai_for_target_countries_iter) > 0) {
    permuted_ai_for_target_countries_iter$permutation_number <- p
  }
  
  # Store the dataframe for this iteration in the list
  list_of_permuted_dfs[[p]] <- permuted_ai_for_target_countries_iter
  
} # End of permutation loop

# Combine all permuted dataframes from the list into one large dataframe
final_permuted_dataset <- bind_rows(list_of_permuted_dfs)

The resulting permuted dataset looks like this for the 6 initial and 6 last lines:

kable(as.data.frame(final_permuted_dataset[1:6,]))
appln_id ctry_code techn_field_nr permutation_number
16633049 JP 13 1
16633049 JP 16 1
16633049 JP 17 1
16633049 JP 23 1
16633049 JP 29 1
25198664 JP 1 1
tail(final_permuted_dataset)

This process creates a long-format dataframe where each permutation_number represents a complete, unique, randomized AI dataset. A summary for the number of patents for Japan and the US in the first 5 permutations is shown below (only these countries had patents in the first interval; China and South Korea join in the second interval).

final_permuted_dataset %>%
  filter(permutation_number <= 5) %>%
  group_by(permutation_number, ctry_code) %>%
  summarise(unique_appln_ids = n_distinct(appln_id), .groups = 'drop') %>%
  print(n=20)
## # A tibble: 10 × 3
##    permutation_number ctry_code unique_appln_ids
##                 <int> <chr>                <int>
##  1                  1 JP                     307
##  2                  1 US                     107
##  3                  2 JP                     307
##  4                  2 US                     107
##  5                  3 JP                     307
##  6                  3 US                     107
##  7                  4 JP                     307
##  8                  4 US                     107
##  9                  5 JP                     307
## 10                  5 US                     107

A crucial step is to handle the non-target countries. Since our hypothesis is not about them, their actual AI patents are kept constant and are simply replicated across all 1,000 permutations. This ensures that the global context remains stable while only the composition of AI within our focus countries is randomized.

The resulting dataset looks like this for the first and last 6 observations:

kable(as.data.frame(replicated_not_selected_ai_final[1:6,]))
appln_id ctry_code permutation_number
16723353 FR 0
16723353 FR 0
16723353 FR 0
36147193 IE 0
36147193 IE 0
36147193 IE 0
tail(replicated_not_selected_ai_final)

Finally, the permuted data for the target countries is combined with the replicated data for non-target countries. We also add the original, non-permuted AI dataset, labeling it as permutation_number = 0. This allows for direct comparison. The entire collection is then joined with the technological field information to prepare for the RTA calculation.

The result is a single dataframe containing the original AI data (permutation 0) and 10 random variations (again, 10 is used for this illustrative example; the number of permutations was set to 1000 in the files that are saved in the folder Files_created_with_the_code/data/files_code_Fields_analysis/robustness/). A summary of this file for this illustrative example looks like this:

table(final_permuted_dataset$permutation_number)
## 
##    0    1    2    3    4    5    6    7    8    9   10 
## 1418 1180 1240 1224 1161 1223 1251 1198 1225 1229 1154

3.2. Calculate AI-specific specialisations

With the 11 datasets (1 actual + 10 permuted) assembled, we now repeat the exact same Revealed Technological Advantage (RTA) calculation performed in Section 1.1. This is done for each permutation, allowing us to generate a distribution of RTA scores for each technological field under the null hypothesis (i.e., when ‘AI’ is random).

The code below iterates through each permutation_number, calculates the AI-specific RTAs for that dataset, and stores the results.

list_of_rca_dfs <- region_tech_fields_perm_df %>%
  group_by(permutation_number) %>%
  group_split() %>% # This splits the df into a list of dfs, one for each permutation
  purrr::map(~{
    current_permutation_number <- unique(.x$permutation_number)
    print(paste("Processing RCA for permutation_number:", current_permutation_number))
    
    # Matrix creation for the current permutation's data
    mat_reg_tech_perm_AI <- .x %>%
      select(-permutation_number) %>% # Temporarily remove for pivot if it causes issues
      arrange(techn_field_nr, ctry_code) %>%
      pivot_wider(names_from = techn_field_nr,
                  values_from = n_tech_reg,
                  values_fill = 0) # Changed from list(n_tech_reg = 0) for simplicity
    
    # Check if ctry_code column exists and is not empty
    if (!"ctry_code" %in% names(mat_reg_tech_perm_AI) || nrow(mat_reg_tech_perm_AI) == 0 || all(is.na(mat_reg_tech_perm_AI$ctry_code))) {
      print(paste("Skipping permutation", current_permutation_number, "due to missing ctry_code or empty data after pivot."))
      return(NULL) # Return NULL or an empty tibble
    }
    
    # Check for duplicate ctry_codes which would prevent rownames_to_column
    if (any(duplicated(mat_reg_tech_perm_AI$ctry_code))) {
      print(paste("Warning: Duplicate ctry_code found for permutation", current_permutation_number, ". Aggregating or handling needed."))
      return(tibble(permutation_number = current_permutation_number, error="duplicate ctry_code"))
    }
    
    
    mat_reg_tech_perm_AI <- mat_reg_tech_perm_AI %>%
      remove_rownames() %>%
      column_to_rownames(var = "ctry_code") %>%
      as.matrix() %>%  round()# No rounding here, location_quotient might prefer raw numbers
    
    # RCA calculation
    # Ensure matrix is suitable (e.g., no NA/NaN/Inf that location_quotient can't handle)
    if (nrow(mat_reg_tech_perm_AI) == 0 || ncol(mat_reg_tech_perm_AI) == 0) {
      print(paste("Skipping RCA for permutation", current_permutation_number, "due to empty matrix."))
      return(NULL)
    }
    
    # Ensure there are at least two columns for location_quotient (ctry_code was one)
    if (ncol(mat_reg_tech_perm_AI) < 1) { # If only ctry_code was present and now it's rownames
      print(paste("Skipping RCA for permutation", current_permutation_number, "due to insufficient columns in matrix."))
      return(NULL)
    }
    
    
    # Check for all zero rows/columns if location_quotient is sensitive
    # For example, if a row sum is 0, RCA might be NaN or Inf.
    # The location_quotient function might handle this, or you might need pre-filtering.
    
    rca_results_perm <- tryCatch({
      mat_reg_tech_perm_AI %>%
        location_quotient(binary = FALSE) %>% 
        as.data.frame() %>%
        rownames_to_column("ctry_code") %>%
        as_tibble() %>%
        gather(key = "techn_field_nr", value = "RCA", -ctry_code) %>%
        arrange(ctry_code, techn_field_nr) %>%
        mutate(permutation_number = current_permutation_number) # Add back permutation number
    }, error = function(e) {
      print(paste("Error in location_quotient for permutation", current_permutation_number, ":", e$message))
      return(tibble(permutation_number = current_permutation_number, ctry_code=NA, techn_field_nr=NA, RCA=NA, error_message = e$message)) # Return an empty or error-marked tibble
    })
    
    return(rca_results_perm)
  })
## [1] "Processing RCA for permutation_number: 0"
## [1] "Processing RCA for permutation_number: 1"
## [1] "Processing RCA for permutation_number: 2"
## [1] "Processing RCA for permutation_number: 3"
## [1] "Processing RCA for permutation_number: 4"
## [1] "Processing RCA for permutation_number: 5"
## [1] "Processing RCA for permutation_number: 6"
## [1] "Processing RCA for permutation_number: 7"
## [1] "Processing RCA for permutation_number: 8"
## [1] "Processing RCA for permutation_number: 9"
## [1] "Processing RCA for permutation_number: 10"
# Combine the list of RCA dataframes into one final dataframe
final_rca_all_permutations_df <- bind_rows(list_of_rca_dfs)

The final output is a comprehensive dataframe containing the calculated RTA scores for every country, technology, and permutation, which looks like this for the first and last 6 rows:

kable(as.data.frame(final_rca_all_permutations_df[1:6,]))
ctry_code techn_field_nr RCA permutation_number
AT 1 0 0
AT 10 0 0
AT 11 0 0
AT 12 0 0
AT 13 0 0
AT 17 0 0
tail(final_rca_all_permutations_df)

This process is repeated for all three time intervals, and the resulting dataframes are saved at Files_created_with_the_code/data/files_code_Fields_analysis/robustness/. These files form the basis for the statistical tests in our econometric analysis. It is also worth noting that these permutations are repeated for several distinct interval lengths, i.e., 15-years, 10-years, 5-years, and 1-year.

4. Regression Analysis

This final section presents the econometric analysis designed to formally test our paper’s hypotheses. Using the data prepared in the previous steps, we construct a panel dataset covering our four focus countries across nine 5-year intervals (from 1974-1978 to 2014-2018). We then run a series of regression models to investigate the factors influencing the emergence and persistence of different types of technological specializations.

The following code block handles the final data preparation, loading the pre-calculated metrics for relative density and specializations, and merging them into a single dataframe ready for regression.

The final dataset for regression looks like this:

head(regression_data_renamed) %>%
  knitr::kable(
    caption = "Preview of the Final Regression Dataset",
    booktabs = TRUE # A style option for prettier tables
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE
  )
Preview of the Final Regression Dataset
Country Country’s techn. rel. dens. period no_specialization general_specialization ai_specific_specialization No. of ‘break-in’ spec. actual_share_coinciding actual_share_round_ai actual_share_round_general No. of sustained ‘break-in’ spec. actual_persistent_just_general actual_persistent_just_ai No. of sustained ‘AI-specific’ spec. actual_n_ai_prev_coinciding actual_n_coinciding_prev_ai actual_n_ai_prev_gen actual_n_persistent_core_fields actual_n_persistent_not_core_fields actual_n_persistent_coin_core_fields actual_n_persistent_coin_not_core_fields actual_ai_core_fields actual_ai_not_core_fields No. of sustained ‘general’ spec. No. of ‘general’ spec. double_check total_specializations No. of ‘AI-specific’ spec. Share of ‘break-in’ spec. Interval
Japan 57 1974-1978 15 19 0 1 0.0500000 0.0285714 0.5714286 0 0 0 0 0 0 0 0 0 0 0 1 0 0 20 35 20 1 0.0500000 1974-1978
South Korea 38 1974-1978 22 13 0 0 0.0000000 0.0000000 0.3714286 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 35 13 0 0.0000000 1974-1978
US 50 1974-1978 19 15 0 1 0.0625000 0.0285714 0.4571429 0 0 0 0 0 0 0 0 0 0 0 0 1 0 16 35 16 1 0.0625000 1974-1978
China 35 1974-1978 24 11 0 0 0.0000000 0.0000000 0.3142857 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 35 11 0 0.0000000 1974-1978
China 32 1979-1983 23 12 0 0 0.0000000 0.0000000 0.3428571 0 5 0 0 0 0 0 0 0 0 0 0 0 5 12 35 12 0 0.0000000 1979-1983
Japan 56 1979-1983 17 16 0 2 0.1111111 0.0571429 0.5142857 1 15 0 1 0 0 0 1 0 1 0 2 0 17 18 35 18 2 0.1111111 1979-1983

4.1. Main models (i.e., the ones from the paper)

Our main analysis consists of two sets of Ordinary Least Squares (OLS) models. The first set (Table 3) examines the factors that influence a country’s share of ‘break-in’ specializations. The second set (Table 4) investigates the determinants of the persistence of specializations over time.

The first three models test the effect of technological relatedness density on the share of break-in specializations. Model 1 provides a baseline, Model 2 adds control variables for the number of general and AI-specific specializations, and Model 3 includes country fixed effects. The estimations for the three first models are:

Effects on the share of break-ins - OLS regression
Dependent variable:
Share of ‘break-in’ spec.
(1) (2) (3)
Country’s techn. rel. dens. 0.001 (0.005) 0.001 (0.006) -0.001 (0.006)
Interval1979-1983 0.004 (0.132) -0.012 (0.074) -0.013 (0.075)
Interval1984-1988 0.164 (0.132) -0.016 (0.079) -0.045 (0.084)
Interval1989-1993 0.348** (0.131) 0.019 (0.085) -0.005 (0.089)
Interval1994-1998 0.405*** (0.131) 0.005 (0.093) -0.002 (0.095)
Interval1999-2003 0.455*** (0.136) -0.007 (0.098) -0.034 (0.103)
Interval2004-2008 0.535*** (0.132) 0.080 (0.095) 0.048 (0.100)
Interval2009-2013 0.534*** (0.131) 0.065 (0.096) 0.042 (0.101)
Interval2014-2018 0.666*** (0.132) 0.118 (0.105) 0.093 (0.110)
CountryJapan -0.018 (0.055)
CountrySouth Korea 0.063 (0.054)
CountryUS 0.049 (0.051)
No. of ‘general’ spec. -0.010 (0.016) 0.002 (0.018)
No. of ‘AI-specific’ spec. 0.036*** (0.005) 0.038*** (0.005)
Constant -0.036 (0.227) 0.100 (0.130) 0.017 (0.152)
Observations 36 36 36
R2 0.673 0.905 0.915
Adjusted R2 0.560 0.862 0.859
Residual Std. Error 0.185 (df = 26) 0.104 (df = 24) 0.105 (df = 21)
F Statistic 5.958*** (df = 9; 26) 20.802*** (df = 11; 24) 16.181*** (df = 14; 21)
Note: p<0.1; p<0.05; p<0.01

The next set of models shifts the focus to persistence. The dependent variable is now the count of specializations of a certain type that are sustained from the previous period. These models help us understand what factors contribute to the durability of a country’s technological advantages. The estimations for these three models are:

Effects on persisting specialisations - OLS regression
Dependent variable:
No. of sustained ‘break-in’ spec. No. of sustained ‘AI-specific’ spec.
(1) (2) (3)
Country’s techn. rel. dens. 0.039 (0.082) 0.015 (0.031) 0.006 (0.078)
Interval1979-1983 0.480 (1.092) -1.431 (1.171) -0.124 (1.044)
Interval1984-1988 -0.941 (1.192) -1.901 (1.120) -0.628 (1.151)
Interval1989-1993 0.423 (1.258) -1.942 (1.330) -0.292 (1.201)
Interval1994-1998 1.854 (1.368) -1.263 (1.148) 0.158 (1.354)
Interval1999-2003 1.708 (1.477) -1.280 (1.234) -0.520 (1.447)
Interval2004-2008 0.919 (1.405) -1.812 (1.245) 0.807 (1.350)
Interval2009-2013 3.546** (1.420) 0.012 (1.305) -1.135 (1.524)
Interval2014-2018 2.861* (1.572) -0.420 (1.297) -0.458 (1.601)
No. of ‘break-in’ spec. 0.413* (0.226) -0.182 (0.231)
No. of ‘general’ spec. 0.113 (0.239) -0.053 (0.228)
No. of ‘AI-specific’ spec. -0.067 (0.150) 0.297** (0.143)
No. of sustained ‘break-in’ spec. 1.115*** (0.199)
No. of sustained ‘general’ spec. 0.133* (0.074)
No. of sustained ‘AI-specific’ spec. 0.525*** (0.070)
Constant -3.642* (1.955) -0.669 (1.464) 0.482 (1.997)
Observations 36 36 36
R2 0.799 0.914 0.921
Adjusted R2 0.694 0.874 0.875
Residual Std. Error 1.531 (df = 23) 0.981 (df = 24) 1.458 (df = 22)
F Statistic 7.606*** (df = 12; 23) 23.111*** (df = 11; 24) 19.828*** (df = 13; 22)
Note: p<0.1; p<0.05; p<0.01

4.2. Extensions (i.e., models used for additional robustness, not included in the paper)

To ensure the robustness of our results, we re-estimate our models using specifications better suited to the nature of our dependent variables.

  • For the models predicting the share of break-ins (a value between 0 and 1), we use Beta regression.

  • For the models predicting the count of persistent specializations, we use Poisson and Negative Binomial regressions, which are designed for count data.

The Beta regression results for the share of break-ins are presented below.

Effects on the share of break-ins - Beta regression
Dependent variable:
Share of ‘break-in’ spec.
(1) (2) (3)
Country’s techn. rel. dens. 0.018 (0.020) 0.015 (0.025) -0.004 (0.023)
Interval1979-1983 -0.046 (0.675) -0.156 (0.582) -0.317 (0.564)
Interval1984-1988 0.442 (0.654) 0.355 (0.536) 0.216 (0.508)
Interval1989-1993 2.031*** (0.604) 0.882* (0.513) 0.707 (0.492)
Interval1994-1998 2.199*** (0.605) 0.696 (0.538) 0.588 (0.514)
Interval1999-2003 2.553*** (0.629) 0.677 (0.553) 0.442 (0.535)
Interval2004-2008 2.867*** (0.611) 1.091** (0.545) 0.883* (0.522)
Interval2009-2013 2.784*** (0.608) 0.968* (0.545) 0.797 (0.522)
Interval2014-2018 3.201*** (0.622) 1.082* (0.586) 0.923* (0.561)
CountryJapan -0.004 (0.230)
CountrySouth Korea 0.324 (0.250)
CountryUS 0.526** (0.220)
No. of ‘general’ spec. -0.088 (0.069) -0.041 (0.072)
No. of ‘AI-specific’ spec. 0.188*** (0.023) 0.208*** (0.023)
Constant -3.342*** (1.052) -2.609*** (0.762) -2.769*** (0.818)
Observations 36 36 36
R2 0.735 0.895 0.895
Log Likelihood 28.419 48.647 52.027
Note: p<0.1; p<0.05; p<0.01

Compared to the OLS-based Table 3, the estimations from the Beta regression are highly consistent. The main differences are minor shifts in significance levels for some time intervals and the US country dummy. Importantly, the main variable of interest, “Country’s techn. rel. dens.”, retains its significance and sign, confirming the robustness of our primary finding.

Next, we test the robustness of the persistence models. The results from the Poisson regression are shown first, followed by the Negative Binomial models, which can be more appropriate if the count data is over-dispersed.

Effects on persisting specialisations - Poisson regression
Dependent variable:
No. of sustained ‘break-in’ spec. No. of sustained ‘AI-specific’ spec.
(1) (2) (3)
Country’s techn. rel. dens. 0.039 (0.031) 0.010 (0.028) 0.021 (0.025)
Interval1979-1983 16.895 (2,728.392) 15.932 (2,842.308) 16.842 (2,853.177)
Interval1984-1988 16.295 (2,728.392) 15.805 (2,842.308) 17.421 (2,853.176)
Interval1989-1993 18.352 (2,728.392) 17.436 (2,842.308) 18.303 (2,853.176)
Interval1994-1998 18.704 (2,728.392) 17.984 (2,842.308) 18.490 (2,853.176)
Interval1999-2003 18.908 (2,728.392) 18.021 (2,842.308) 18.434 (2,853.176)
Interval2004-2008 18.676 (2,728.392) 17.928 (2,842.308) 18.688 (2,853.176)
Interval2009-2013 19.134 (2,728.392) 18.119 (2,842.308) 18.196 (2,853.176)
Interval2014-2018 18.911 (2,728.392) 18.078 (2,842.308) 18.282 (2,853.176)
No. of ‘break-in’ spec. 0.048 (0.086) -0.054 (0.072)
No. of ‘general’ spec. 0.016 (0.091) -0.102 (0.075)
No. of ‘AI-specific’ spec. 0.037 (0.062) 0.111** (0.050)
No. of sustained ‘break-in’ spec. 0.166*** (0.063)
No. of sustained ‘general’ spec. 0.072 (0.058)
No. of sustained ‘AI-specific’ spec. 0.106*** (0.039)
Constant -20.368 (2,728.392) -18.733 (2,842.308) -17.771 (2,853.177)
Observations 36 36 36
Log Likelihood -47.003 -45.297 -53.420
Akaike Inf. Crit. 120.005 114.594 134.839
Note: p<0.1; p<0.05; p<0.01

For the Negative binomial, the results look like this:

Effects on persisting specialisations - Negative binomial regression
Dependent variable:
No. of sustained ‘break-in’ spec. No. of sustained ‘AI-specific’ spec.
(1) (2) (3)
Country’s techn. rel. dens. 0.039 (0.031) 0.010 (0.028) 0.021 (0.025)
Interval1979-1983 17.895 (4,498.357) 16.932 (4,686.174) 17.842 (4,704.093)
Interval1984-1988 17.295 (4,498.357) 16.805 (4,686.174) 18.421 (4,704.093)
Interval1989-1993 19.352 (4,498.357) 18.436 (4,686.174) 19.303 (4,704.093)
Interval1994-1998 19.704 (4,498.357) 18.984 (4,686.173) 19.490 (4,704.093)
Interval1999-2003 19.908 (4,498.357) 19.021 (4,686.173) 19.434 (4,704.093)
Interval2004-2008 19.676 (4,498.357) 18.928 (4,686.173) 19.688 (4,704.093)
Interval2009-2013 20.134 (4,498.357) 19.119 (4,686.173) 19.196 (4,704.093)
Interval2014-2018 19.911 (4,498.357) 19.078 (4,686.173) 19.282 (4,704.093)
No. of ‘break-in’ spec. 0.048 (0.086) -0.054 (0.072)
No. of ‘general’ spec. 0.016 (0.091) -0.102 (0.075)
No. of ‘AI-specific’ spec. 0.037 (0.062) 0.111** (0.050)
No. of sustained ‘break-in’ spec. 0.166*** (0.063)
No. of sustained ‘general’ spec. 0.072 (0.058)
No. of sustained ‘AI-specific’ spec. 0.106*** (0.039)
Constant -21.368 (4,498.357) -19.733 (4,686.174) -18.771 (4,704.093)
Observations 36 36 36
Log Likelihood -48.003 -46.298 -54.420
theta 46,856.060 (755,064.600) 45,122.010 (654,584.900) 52,915.220 (653,410.000)
Akaike Inf. Crit. 122.006 116.595 136.841
Note: p<0.1; p<0.05; p<0.01

The results from both the Poisson and Negative Binomial regressions are nearly identical to each other and largely consistent with the OLS models. While some variables with marginal significance in the OLS models lose significance here, the main relationships of interest hold. This confirms that our findings regarding the persistence of specializations are not sensitive to the choice of a linear versus a count data model specification.