Empirical Observations regarding Predictability in User Access-Behavior in a Distributed Digital Library System
Preprint, 2003
Today document archives are geographically distributed but often not
replicated. This can potentially result in a low quality of service
in terms of reduced availability and long user-perceived access
times. Instead of indiscriminate replication we study the
effectiveness of caching techniques such as prefetching and
selective preloading.
Our technique analyzes whether user access behavior is predictable
enough to guess what articles to prefetch or to preload based on
access logs from DADS, a digital library system for scientific
journal articles developed at DTV, the Technical Knowledge Center of
Denmark. We have found that once a literature search has been
narrowed to up to ten articles, there is a high likelihood that some
of them will be eventually downloaded. This suggests that
prefetching can be used to hide the article transfer latency. We
have also found that 80% of the article downloads are confined to
less than 20% of the journals, so preloading a small fraction of
the digital library database could significantly shorten the access
latency and improve the availability.