Empirical Observations regarding Predictability in User Access-Behavior in a Distributed Digital Library System
Paper i proceeding, 2002
Document archives are today geographically
distributed but often not replicated. This can
potentially result in a low quality of service in
terms of reduced availability and long user-perceived
access times, especially during peak
hours. Indiscriminate replication is not feasible due
to the sheer size of the database and its
administration. In an ongoing project, the goal is to
study the effectiveness of caching techniques like
prefetching and selective preloading to improve
quality of service of digital library systems.
In this paper, we analyze whether user access
behavior is predictable enough to use it to guess
what articles to prefetch or to preload based on user
access logs from DADS, a digital library system
developed at the Technical Knowledge Center of
Denmark, DTV. We have found that once a literature
search has been narrowed down to less than ten
articles, there is a high likelihood that some of
them will be eventually downloaded. This suggests
that prefetching can be used to hide the article
transfer latency. We have also found that as many as
80% of the article downloads are confined to less
than 20% of the journals. This suggests that
preloading a small fraction of the digital library
database can significantly shorten the access latency
as well as improving the availability.