No Cover Image

E-Thesis 43667 views 97 downloads

Supporting user selection of digital libraries. / Helen Margaret Dodd

Swansea University Author: Helen Margaret Dodd

Abstract

Subject specialists and researchers often face the problem of identifying authoritative collections: those directly about their topic of interest, to which they regularly return to satisfy related information needs or monitor for new material. Discovery of such collections is often incidental or rel...

Full description

Published: 2013
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
URI: https://cronfa.swan.ac.uk/Record/cronfa42656
first_indexed 2018-08-02T18:55:14Z
last_indexed 2018-08-03T10:10:43Z
id cronfa42656
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2018-08-02T16:24:30.0085919</datestamp><bib-version>v2</bib-version><id>42656</id><entry>2018-08-02</entry><title>Supporting user selection of digital libraries.</title><swanseaauthors><author><sid>84429f17db964f7de32c03d2d8bf09b8</sid><ORCID>NULL</ORCID><firstname>Helen Margaret</firstname><surname>Dodd</surname><name>Helen Margaret Dodd</name><active>true</active><ethesisStudent>true</ethesisStudent></author></swanseaauthors><date>2018-08-02</date><abstract>Subject specialists and researchers often face the problem of identifying authoritative collections: those directly about their topic of interest, to which they regularly return to satisfy related information needs or monitor for new material. Discovery of such collections is often incidental or relies on suggestions from domain experts. Services such as general purpose search engines and repository directories offer limited support for this search task. As such, there is a clear need for a search service specifically to assist users in finding collections that can serve both their current and future information needs; we refer to this task herein as collection suggestion. However, developing an effective search service of this kind requires fundamental research. There are several preconditions that should be addressed; it is these that form the focus of this thesis. We summarise these areas as follows. An effective search service calls for an appropriate algorithm; in this instance, an algorithm for ranking collections with respect to the user's query. To this end, we investigate the applicability of existing algorithms, from relevant domains (collection selection and query performance prediction), to collection suggestion. In addition, towards identifying an optimal algorithm for a collection suggestion search service, we specify and test a new algorithm (and several alternative variants), designed specifically for this task. The requirement of an appropriate algorithm presents the question of how we evaluate the effectiveness of an algorithm. We have formulated a methodology (comprising evaluation strategies and performance measures) and developed apparatus for evaluating algorithms, with respect to collection suggestion. As far as possible, we have drawn on and extended established algorithm evaluation techniques, to ensure our work follows the expectations of information retrieval research. Our empirical work is conducted over several synthetic and realistic test data sets: we use established data sets built from the TREC document corpus, in addition to data sets of our own compilation, comprising data from real repositories. This combination of test data types ensures a rigorous test environment for algorithms. Over our test environment, we have found three algorithms to be potentially suitable for application in a collection suggestion search service. One collection selection algorithm (CORI), and two variants of our own algorithm were shown to have strong and consistent performance, across the range of test data sets and performance measures used.</abstract><type>E-Thesis</type><journal/><journalNumber></journalNumber><paginationStart/><paginationEnd/><publisher/><placeOfPublication/><isbnPrint/><issnPrint/><issnElectronic/><keywords>Computer science.</keywords><publishedDay>31</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2013</publishedYear><publishedDate>2013-12-31</publishedDate><doi/><url/><notes/><college>COLLEGE NANME</college><department>Computer Science</department><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><apcterm/><lastEdited>2018-08-02T16:24:30.0085919</lastEdited><Created>2018-08-02T16:24:30.0085919</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Helen Margaret</firstname><surname>Dodd</surname><orcid>NULL</orcid><order>1</order></author></authors><documents><document><filename>0042656-02082018162511.pdf</filename><originalFilename>10805432.pdf</originalFilename><uploaded>2018-08-02T16:25:11.6300000</uploaded><type>Output</type><contentLength>12947798</contentLength><contentType>application/pdf</contentType><version>E-Thesis</version><cronfaStatus>true</cronfaStatus><embargoDate>2018-08-02T16:25:11.6300000</embargoDate><copyrightCorrect>false</copyrightCorrect></document></documents><OutputDurs/></rfc1807>
spelling 2018-08-02T16:24:30.0085919 v2 42656 2018-08-02 Supporting user selection of digital libraries. 84429f17db964f7de32c03d2d8bf09b8 NULL Helen Margaret Dodd Helen Margaret Dodd true true 2018-08-02 Subject specialists and researchers often face the problem of identifying authoritative collections: those directly about their topic of interest, to which they regularly return to satisfy related information needs or monitor for new material. Discovery of such collections is often incidental or relies on suggestions from domain experts. Services such as general purpose search engines and repository directories offer limited support for this search task. As such, there is a clear need for a search service specifically to assist users in finding collections that can serve both their current and future information needs; we refer to this task herein as collection suggestion. However, developing an effective search service of this kind requires fundamental research. There are several preconditions that should be addressed; it is these that form the focus of this thesis. We summarise these areas as follows. An effective search service calls for an appropriate algorithm; in this instance, an algorithm for ranking collections with respect to the user's query. To this end, we investigate the applicability of existing algorithms, from relevant domains (collection selection and query performance prediction), to collection suggestion. In addition, towards identifying an optimal algorithm for a collection suggestion search service, we specify and test a new algorithm (and several alternative variants), designed specifically for this task. The requirement of an appropriate algorithm presents the question of how we evaluate the effectiveness of an algorithm. We have formulated a methodology (comprising evaluation strategies and performance measures) and developed apparatus for evaluating algorithms, with respect to collection suggestion. As far as possible, we have drawn on and extended established algorithm evaluation techniques, to ensure our work follows the expectations of information retrieval research. Our empirical work is conducted over several synthetic and realistic test data sets: we use established data sets built from the TREC document corpus, in addition to data sets of our own compilation, comprising data from real repositories. This combination of test data types ensures a rigorous test environment for algorithms. Over our test environment, we have found three algorithms to be potentially suitable for application in a collection suggestion search service. One collection selection algorithm (CORI), and two variants of our own algorithm were shown to have strong and consistent performance, across the range of test data sets and performance measures used. E-Thesis Computer science. 31 12 2013 2013-12-31 COLLEGE NANME Computer Science COLLEGE CODE Swansea University Doctoral Ph.D 2018-08-02T16:24:30.0085919 2018-08-02T16:24:30.0085919 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Helen Margaret Dodd NULL 1 0042656-02082018162511.pdf 10805432.pdf 2018-08-02T16:25:11.6300000 Output 12947798 application/pdf E-Thesis true 2018-08-02T16:25:11.6300000 false
title Supporting user selection of digital libraries.
spellingShingle Supporting user selection of digital libraries.
Helen Margaret Dodd
title_short Supporting user selection of digital libraries.
title_full Supporting user selection of digital libraries.
title_fullStr Supporting user selection of digital libraries.
title_full_unstemmed Supporting user selection of digital libraries.
title_sort Supporting user selection of digital libraries.
author_id_str_mv 84429f17db964f7de32c03d2d8bf09b8
author_id_fullname_str_mv 84429f17db964f7de32c03d2d8bf09b8_***_Helen Margaret Dodd
author Helen Margaret Dodd
author2 Helen Margaret Dodd
format E-Thesis
publishDate 2013
institution Swansea University
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description Subject specialists and researchers often face the problem of identifying authoritative collections: those directly about their topic of interest, to which they regularly return to satisfy related information needs or monitor for new material. Discovery of such collections is often incidental or relies on suggestions from domain experts. Services such as general purpose search engines and repository directories offer limited support for this search task. As such, there is a clear need for a search service specifically to assist users in finding collections that can serve both their current and future information needs; we refer to this task herein as collection suggestion. However, developing an effective search service of this kind requires fundamental research. There are several preconditions that should be addressed; it is these that form the focus of this thesis. We summarise these areas as follows. An effective search service calls for an appropriate algorithm; in this instance, an algorithm for ranking collections with respect to the user's query. To this end, we investigate the applicability of existing algorithms, from relevant domains (collection selection and query performance prediction), to collection suggestion. In addition, towards identifying an optimal algorithm for a collection suggestion search service, we specify and test a new algorithm (and several alternative variants), designed specifically for this task. The requirement of an appropriate algorithm presents the question of how we evaluate the effectiveness of an algorithm. We have formulated a methodology (comprising evaluation strategies and performance measures) and developed apparatus for evaluating algorithms, with respect to collection suggestion. As far as possible, we have drawn on and extended established algorithm evaluation techniques, to ensure our work follows the expectations of information retrieval research. Our empirical work is conducted over several synthetic and realistic test data sets: we use established data sets built from the TREC document corpus, in addition to data sets of our own compilation, comprising data from real repositories. This combination of test data types ensures a rigorous test environment for algorithms. Over our test environment, we have found three algorithms to be potentially suitable for application in a collection suggestion search service. One collection selection algorithm (CORI), and two variants of our own algorithm were shown to have strong and consistent performance, across the range of test data sets and performance measures used.
published_date 2013-12-31T13:30:57Z
_version_ 1822046612772356096
score 11.048453