Binary indexes for optimising corpus queries
Paper in proceeding, 2024

To be able to search for patterns in annotated text corpora is crucial for many different research disciplines. However, searching for complex patterns in large corpora can take long time – sometimes several minutes or even hours. We investigate how inverted indexes can be used for efficient searching in large annotated corpora, and in particular binary indexes. We show how corpus queries are translated into lookups in unary and binary inverted indexes, and give efficient strategies for combining the results using efficient set operations. In addition we discuss how to make use of binary indexes for more complex query types.

Author

Peter Ljunglöf

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

University of Gothenburg

Nicholas Smallbone

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

University of Gothenburg

Mijo Thoresson

Student at Chalmers

Victor Salomonsson

Student at Chalmers

20th Conference on Natural Language Processing, KONVENS 2024 - Proceedings of the Conference

149-158

20th Conference on Natural Language Processing (KONVENS 2024)
Wien, Austria,

Subject Categories (SSIF 2011)

Language Technology (Computational Linguistics)

More information

Latest update

2/20/2025