Patenting is one of the best historial records we have of technological innovation over the last twocenturies. Patents enable a view of major developments and trends in diﬀerent areas of technology,
including technologies that can be called “greentech” (diﬀerent operational deﬁnitions are possible, but one is Environmentally Sound Technologies (ESTs), as they are described by the United Na-
tions Framework Convention on Climate Change, and tracked by the World Intellectual Property Organization). Technologies that have an environmentally sound impact are naturally relevant to a number of the Sustainable Development Goals. With millions of patents being granted each year (world-wide), a challenge is that the patent
system itself is overwhelmingly large and often very complicated for manual analysis. For many purposes in both research and commercial activities, researchers, ﬁrms, and individuals would want
to keep track of recent innovations in a particular techology class; this is something that patent oﬃces manage to do by large scale organization. This can be hard for smaller research teams, ﬁrms,
and individuals with limited resources. Moreover, the patenting process takes time, on the average around 3 years: This makes it diﬃcult to get a suﬃciently recent view of the latest developments.
The aim of this project is to apply and evaluate machine learning algorithms to automatically search for and ﬁnd greentech innovations no later than 18 months after when they have been ﬁled
for patenting (in the US). In other terms, we attempt to reduce the lag between the innovation frontier by around 50%, and to evaluate how well methods from AI Natural Language Processing
work to achieve this end. The sheer volume of patent data creates a natural opportunity for analysis using methods from modern machine learning: We speciﬁcally select greentech patents in the full
range of US patents from 1976 and onwards (over 6 million patent full texts) and frame it as a task of statistical learning. By using the fact that many patents have already been classiﬁed in the
past and manually labelled by human experts, we have training data as input to machine learning models that can learn to label new (yet unlabelled) patent texts as being either greentech or not.
Forskningsingenjör vid Chalmers, Data- och informationsteknik, CSE Verksamhetsstöd, Data Science Research Engineers
Senior forskare vid Chalmers, Rymd-, geo- och miljövetenskap, Fysisk resursteori
Finansierar Chalmers deltagande under 2019–