Appendix 2: Methodology for Patent Landscaping
Below is a description of the steps taken for the patent-landscaping exercise, the results of which are presented in Chapter 2.
Dataset-generation methodology
In the first stage, Chatham House and CambridgeIP mapped out the relevant technology areas that contribute to emissions mitigation from cement and concrete production. This was supplemented by a survey of the broader intellectual property landscape for cement and concrete to build up a set of keyword descriptors and classification systems, including Cooperative Patent Classification (CPC) and International Patent Classification (IPC) systems, for the different technology areas.
On the basis of expert interviews, stakeholder engagement and desktop research, the scope of the patent analysis was narrowed to: products and processes to do with lowering or entirely replacing the Portland clinker content of cement and concrete. Once this focus area was chosen, CambridgeIP built a comprehensive Boolean search algorithm based on a combination of keyword descriptors and targeted CPC- and IPC-based searches. Boolean search algorithms are a commonly used patent search method. To demonstrate, a very simple example of a search for belitic clinkers might be: (belite OR ‘dicalcium silicate’ OR Ca2SiO4) AND (clinker OR cement).
Searches were performed for title, abstract and claims across all available patent databases. The patent dataset was compiled from LexisNexis’s TotalPatent database.464 Patent searches were conducted in the first quarter of 2017 (see Table 8 for an overview of the subsequent filtering and quality control steps taken).
Table 8: Overview of patent dataset creation
Process stage |
Detail |
Dataset size |
---|---|---|
Dataset 1 |
Keyword descriptors and IPC/CPC codes are combined through iterative development into a search algorithm that collects relevant patent documents into a broadly focused inclusive dataset. |
19,225 |
Dataset 2 |
The dataset then has all patent family duplicates temporarily removed to enable manual expert review and data cleaning. Name normalization is undertaken to account for assignee and inventor name variations throughout the dataset so as to standardize publication ownership. |
2,170 |
Dataset 3 |
A semi-automated manual expert review of this family-collapsed dataset filters out any false positives collected by the broad search algorithm through combinations of title, abstract and claim keywords, classification codes and assignee filtering. Relevance for remaining documents is confirmed through random sampling. |
1,571 |
Dataset 4 |
The final expert-reviewed dataset is re-supplemented with all relevant family members to create the final dataset, including all relevant patent documents. |
4,577 |
Using CPC and IPC codes
CPC- and IPC-based searches use CPC and IPC codes assigned by the patent examiner to find patents. For example, technologies relating to climate-change mitigation in the context of cement production might be assigned the CPC code Y02P40/10.
Although these codes act as a helpful guide for defining the technology space, there are reasons to believe that CPC- and IPC-based searches may be imperfect. Especially in the case of CPC codes, not all historical patents have been manually assessed and so some may be missing from this dataset.465 For new patents, the CPC codes will be assigned directly within the examination procedure and so will be more accurate. Moreover, there are likely to be innovations that lie outside the definition used for a given code but that contribute in some way to the outcome in question. For example, Y02P40/10 codes are application-based rather than directly technology-based, which results in a fuzzier overlap with older classification systems and between technology subsystems. Even with a highly specific CPC code, it is difficult to distinguish between different technology systems. We therefore see CPC- and IPC-based searches as insufficient on their own, but as a valuable complement to Boolean searches
In technology areas that did not fall within the specific search focus – such as alternative fuel use or CCS, as in figures 9 and 10 in Chapter 2 – we used CPC and IPC codes to get a general sense of the intellectual property landscape while recognizing that this was likely to underestimate overall patent numbers within those technology areas.
Disaggregation of technology subcategories
Existing expert research466 indicated the presence of important technology subcategories within the focus search area. We therefore further disaggregated the dataset into more focused subcategories so as to analyse patterns within these as well. For each technology subsystem, sets of keywords most likely to be used by patents within the subsectors were developed (e.g. for waste glass, these would include ‘waste glass’, ‘glass’, ‘recycled glass’ and ‘recycled glass powder’). Searches for these keywords were performed within title, abstract and claims, and combined with CPC and IPC classification codes to filter the dataset into category groupings.
These groupings were reviewed manually to determine whether any systematic false positives were encroaching upon the categories as a result of alternative uses for keywords, or as a result of records being captured under classification codes for non-relevant applications. These records were then filtered and removed from that category designation.
After multiple iterations of this approach, clean categories were developed, grouping related technologies together. During the category review process, further subdivisions providing greater granularity were sometimes recognized, resulting in further separations within the original planned categories. The full set of technologies included within this focus area is mapped out in Appendix 1.
Limitations
Patent landscaping has several limitations, which were not fully addressed owing to resource constraints among other factors. These limitations include the following:
- There is a lag of up to 18 months in the publication of patent applications by various patent offices. Recent changes in the landscape may not be captured by the analysis.
- The searches were performed in English. This should capture the majority of relevant patents and patent families. However, owing to language differences, some patents are likely to have been missed and false positives may have cropped up due to mistranslations.
- The cement industry sees many mergers and acquisitions. Following an acquisition, the patent names are frequently not reassigned. Assignee names may not always capture these changes.
- Some relevant technologies may have been missed in the focus search area and in the technology subcategorization. Moreover, the boundaries of the technology spaces shift over time, so there may be some newer areas of innovation that were not identified.
- Smaller patent portfolios may, on occasion, play a more significant role than is suggested by the patent rankings. Some of the important disruptive technology and innovation may come from SMEs and individual innovators. These tend to file a small number of patents due to limited resources, and therefore may not be picked up in analysis of key players based on total numbers of patents held.