Binary indexes for optimising corpus queries
Paper in proceeding, 2024

To be able to search for patterns in annotated text corpora is crucial for many different research disciplines. However, searching for complex patterns in large corpora can take long time – sometimes several minutes or even hours. We investigate how inverted indexes can be used for efficient searching in large annotated corpora, and in particular binary indexes. We show how corpus queries are translated into lookups in unary and binary inverted indexes, and give efficient strategies for combining the results using efficient set operations. In addition we discuss how to make use of binary indexes for more complex query types.

Author

Peter Ljunglöf

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

Nicholas Smallbone

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

Mijo Thoresson

Chalmers

Victor Salomonsson

Chalmers

Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)

149-158

20th Conference on Natural Language Processing (KONVENS 2024)
Wien, Austria,

Subject Categories

Language Technology (Computational Linguistics)

More information

Latest update

10/24/2024