-
Methodological Diversity in the Evaluation of Cultural Heritage Digital Libraries and Archives: An Analysis of Frameworks and Methods / Diversité méthodologique dans l’evaluation des bibliothèques et les archives numériques du patrimoine culturel : Une analyse des cadres et des méthodes
Digital library evaluation has become increasingly important in information science, yet there has been minimal evaluative work focusing on digital cultural heritage. We report on a comprehensive review of methodologies and frameworks used in the evaluation of cultural heritage digital libraries and archives. Empirical studies are examined using Tefko Saracevic’s digital library evaluation framework to identify models, frameworks, and methodologies in the literature and to categorize these past evaluative approaches. Through the classification and critique of evaluative types and trends, we aim to develop a set of recommendations for the future evaluation of cultural heritage digital libraries and archives.
L’évaluation de bibliothèques numériques a gagné beaucoup en importance dans les sciences de l’information, toutefois, il y a eu peu de travail sur l’aspect de l’héritage culturel numérique. Cet article présente un sommaire des méthodologies, plateformes et techniques utilisées pour l’évaluation des bibliothèques et archives d’héritage culturel numérique. Des études empiriques sont examinées en utilisant le cadre d’évaluation de bibliothèques numériques de Tefko Saracevic pour identifier les modèles, cadres et méthodologies dans la littérature et les catégoriser selon leurs approches. Avec la classification et la critique des types d’approches et de tendances, nous établirons des lignes directrices et des recommandations pour les évaluations de bibliothèques d’héritage culturel futures.
cultural heritage, digital libraries, evaluation, methodologies
Héritage culturel, Bibliothèques numériques, Évaluations, Méthodologies [End Page 316]
Introduction
The heterogeneous materials and multicultural user groups represented in the cultural heritage field pose a unique challenge for both the design and evaluation of information systems (Petras, Stiller, and Gäde 2013). Evaluation of cultural heritage digital libraries (CHDLs) typically falls under the system-centric or user-centric perspective, and these varying approaches bring to light the features and challenges associated with different evaluative techniques (Petras, Stiller, and Gäde 2013). While not specific to the cultural heritage field, Saracevic (2004) identified seven approaches to digital library (DL) evaluation, all addressing different components or goals: systems-centred, human-centred, usability-centred, anthropological, sociological, economic, and ethnographic. Saracevic (2000) is also credited for introducing the five main elements that frame DL evaluations: construct, context, criteria, measures, and methodology. Other relevant evaluation frameworks have been Nicholson’s (2004) holistic matrix model, which examines systems and use from an internal and external view, and Fuhr et al.’s (2001, 2007) DELOS evaluation framework, which is centred on three approaches: system evaluation, usefulness evaluation, and usability evaluation. Ultimately, past research has deemed interface usability, system performance, and collection value to be the most agreed upon evaluation criteria (Xie 2008).
In this article, we discuss the results of a systematic review and meta-analysis on the evaluation of cultural heritage digital libraries and archives. To complete this analysis, we first located the existing literature on the evaluation of CHDLs from library and information studies databases and resources and compiled a bibliography of relevant literature. During this process, we ran into a problem that would recur throughout our review—namely, the difficulty in singularly defining CHDLs. For the purposes of this study, a CHDL was defined as an online repository of digital objects related to the cultural heritage of one or more cultural groups. To broaden the scope of this project, evaluations of digital cultural heritage articles involving institutions across the galleries, libraries, archives, and museums (GLAM) sector were included. Second, we examined each article to identify the frameworks, approaches, methodologies, and data-gathering tools that had been used in previous evaluative studies. During this stage, we removed articles that did not include specific information about their frameworks or did not include a CHDL evaluation, bringing the number of included articles from 103 to 59. Many of the articles that were identified discussed evaluation frameworks, but surprisingly few tested the frameworks on a specific CHDL, which limited the study’s sample size. The remaining information was tabulated and categorized in a large spreadsheet informed by Saracevic’s (2000) five elements that frame DL evaluations. Finally, we identified the specific cultural elements that were being evaluated in these DL studies. While DLs have been an increasingly important area of study in library and information studies, the specificity and sensitivity required when dealing with cultural heritage in the digital space has often been overlooked. To address this issue, this article will demonstrate the ways in which past CHDL evaluations have failed to account for cultural context and community [End Page 317] engagement and discuss the areas that have been focused on instead. We will then provide a series of evaluative guidelines and recommendations for future research on cultural heritage digital libraries and archives, which can also inform the development of future evaluation frameworks.
DL evaluation
DL evaluation studies are intended to ascertain the level of performance or the value of DL systems. There are different ways that DLs can be judged and different priorities or measures of importance. DLs are often evaluated on their effectiveness, efficiency, and usability. Based on our findings, the majority of DL evaluations focus on usability studies, from both a user and expert perspective. Usability can mean a number of different things, but Nielsen (1993) points out that they are typically encompassed within five main attributes: learnability, efficiency, memorability, errors, and satisfaction. Usability studies cover a broad range of topics and contexts, making them amenable to a variety of research methods including interviews, observation, focus groups, web log analysis, usage statistics, usability testing, and surveys (Xie 2006). The collection of user data then allows researchers to identify and understand user needs, address potential problems, and assess user satisfaction. While this study will show that there is still a clear focus on usability evaluations, CHDL evaluation studies may also examine system content and performance. These system-focused evaluations often focus on system speed and search capabilities, metadata and linked data capabilities, interface design, and content coverage (Xie 2006). Although these studies can also be from the user perspective, they are more likely to be from the expert perspective than from the usability evaluations, due to the nature of the evaluation methods.
Usability and performance-based evaluations (including interface and system-centred) encompass the vast majority of CHDL evaluations identified in this study; however, this alludes to a false dichotomy between the two contexts. In actuality, this study identified that the vast majority of CHDL evaluations combine aspects of at least two evaluation contexts and even occasionally combine all three. This is due to the overwhelming user focus of these studies, as most studies, even when talking about the library interface or system performance, took usability into account or included user testing of the system. A less common, but still identifiable, type of DL evaluation is an impact study. Impact studies examine the impact of digital library users and communities through a number of components (Xie 2006). Xie (2006) notes that these studies often incorporate a longitudinal component in order to identify usage and research trends in target communities over time. Impact studies are important in identifying the needs of user communities and solidifying the importance of CHDL in their communities. The trends and usage statistics gathered have implications for collections management in the digital space (Xie 2006). DL impact studies also lend themselves well to usability studies, as perceived usefulness and ease of use are both considered determinants of DL acceptance, which directly affects the potential impact of a DL (Xie 2006). [End Page 318]
Although it is clear that there has been an increase in research on DL evaluation over time, there has been little discussion of evaluation criteria, especially when dealing with cultural heritage content in CHDLs. Most current research makes use of pre-existing evaluation criteria for traditional libraries or evaluation models made for DLs, but little research has focused on the creation of an evaluation framework specific to cultural heritage institutions (Stiller and Petras 2018; Xie 2006). Marchionini (2000) has suggested that DLs are extensions and augmentations of physical libraries; however, it may be useful to look to other frameworks surrounding digital technologies for evaluation criteria, including storage capacity, cost per operation, and response time. In many ways, DLs straddle the traditional and the contemporary, and the inclusion of evaluative criteria that prioritizes both of these sets of values is critical to the success of digital and traditional library integration.
Ultimately, as a marriage between the two, Saracevic (2000) identified a set of criteria specifically for DLs: traditional library criteria—collection (purpose, scope, authority, coverage, currency, audience, cost, format, treatment, and preservation), information (accuracy, appropriateness, links, representation, uniqueness, comparability, and presentation), use (accessibility, availability, searchability, and usability), and standards; traditional information retrieval criteria—relevance (precision and recall), satisfaction, and success; and traditional human-computer interaction/interface criteria—usability, functionality, efforts, task appropriateness, and failures. As stated previously, a majority of past DL evaluations in the cultural heritage sector have been usability studies, although other studies have examined collections, systems, and impact. However, these evaluation criteria have focused on DLs as a general category and have failed to account for the unique position of CHDLs in the preservation of cultural heritage resources.
In this study, we used Saracevic’s (2000) evaluation framework, which our analysis demonstrated to be one of the more widely used frameworks in evaluation research (Stiller and Petras 2018). The elements incorporated into this framework have been adapted in other DL evaluation frameworks, including DELOS (Candela et al. 2007) and MEDaL (Xie and Matusiak 2016). While these evaluation frameworks are the most relevant for our study, different approaches have been taken for other DL projects. These include the Interaction Triptych Evaluation Model, which defines users, systems, and content as the most important evaluation components; Tsakonas and Papatheodorou’s (2011) DiLEO DL evaluation ontology, a research effort to model components from different frameworks in order to guide the understanding of the DL evaluation process; Gonçalves et al.’s (2004) 5S model, which evaluates DLs on the basis of five main components (streams, structures, spaces, scenarios, and societies); and Blandford et al.’s (2008) PRET A Rapporter framework, which supports the design of evaluation user studies (Stiller and Petras 2018). Saracevic’s (2000) evaluation framework introduces five elements that frame the evaluation of DLs: construct, context, criteria, measures, and methodology. This framework was chosen because there were no widely accepted, identifiable frameworks created specifically for CHDLs. [End Page 319] Each element represents a component of our study’s evaluation and is described in the following way:
1. Construct for evaluation: what is there to evaluate; what is encompassed by a DL; and what elements are involved in the evaluation?
2. Context of evaluation: select a goal, framework, or level of evaluation: what is the level of evaluation and what is critical for the selected level?
3. Criteria reflecting performance as related to selected objectives: what parameters of performance to concentrate on and what dimension or characteristic to evaluate?
4. Measures reflecting selected criteria to record the performance: what specific measures to use for a given criterion?
5. Methodology for doing evaluation; what measuring instruments to use and what procedures to use for data collection and analysis?
The analysis in this article closely follows the elements presented by Saracevic (2000), focusing on evaluation studies and use cases of CHDLs. This article will thus concentrate on the frameworks and criteria used in the evaluation of CHDLs as well as what we learned from the outcomes of these studies and recommendations for future evaluation frameworks better suited for cultural heritage online environments.
Data gathering and analysis methods
For this analysis, we gathered relevant studies through a systematic search of works related to the evaluation of CHDLs. Searches were conducted on the Directory of Open Access Journals, the Association for Computing Machinery’s (ACM) Digital Library, Scopus, Library and Information Science Source, and the University of Alberta’s general article search feature through EBSCO and were not limited by time period or geographic area. Terms used in these searches included cultural heritage, DLs, digital archives, evaluation (user and system), methodologies, methods, approaches, and frameworks. A resulting list of 103 articles were collected before being reviewed for relevance. The collected articles were primarily limited by language, including only English and Spanish articles. Relevance was determined through further examination of the content in the collected articles. Articles were removed if DL evaluation was discussed, but no evaluation was conducted. Additionally, articles that described already existing evaluation frameworks but did not include a use case scenario of the framework were excluded. Fifty-nine articles remained in the study following this review process, indicating that these studies covered a specific CHDL evaluation study. Following this, each of the remaining articles was analysed according to Saracevic’s five evaluation elements (construct, context, criteria, measures, and methodology), with the relevant material extracted and placed on a spreadsheet. For this study, the construct category resulted in a single group (cultural heritage institutions), as our focus was primarily on framework and criteria, not the specific institution involved in the evaluation. This conclusion was decided upon [End Page 320] through our grounded theory approach, defined by Glaser and Strauss (1967, 1) as’ the discovery of theory from data—systematically obtained and analysed in social research.’ This qualitative method informed our analysis and the interpretation of the collected CHDL data. While the construct did not end up being central to this study, it is important to note that Europeana and CULTURA (CULTivating Understanding and Research through Adaptivity) were the most frequently evaluated projects, which can limit the large-scale applicability of the results of our work.1
Analysis and discussion
In this section, we present a list of the evaluation frameworks defined and included in the evaluation studies identified for this project as well as the results of assessing the evaluations with Saracevic’s framework and further categorizing the results in the tables that follow.
Frameworks
The frameworks identified in this study cover a range of GLAM institutions, including libraries, archives, and museums. They differ in a number of ways, including priority (system, user, interface, impact), medium (print, audio, video), and focus (technology, usability). Table 1 presents a compiled list of frameworks described in the aforementioned works.
Context
As the constructs for this study have all been simplified into the large family of “cultural heritage institutions,” the first evaluation element of Saracevic’s (2000) model is context. The construct was kept intentionally broad as the focus of our research was on the specific ways in which CHDLs were evaluated and not on [End Page 321] the CHDLs themselves (although the CHDLs can be seen in Appendix 1). Following this, context looks at the perspective used in the evaluation, whether it is user centred, interface centred, or system centred. The results indicate that user-centred evaluations are the most common, which differs from Saracevic’s (2004) assertion that system-centred contexts are the most prevalent in DL evaluations. Whether this is simply a difference in categorization or if CHDLs really differ from DLs in their evaluation priorities remains unknown. The majority of user-centred evaluations targeted individual user experiences through direct user involvement in the process, whereas interface and system-centred designs frequently combined the user experience with interface usability or system search efficiency. Table 2 summarizes the number of evaluations per perspective for this study.
Methods
The methods and methodologies identified in this study were numerous and varied, but they have been categorized into four main groups, adapted from Stiller and Petras’s (2018) study on Europeana’s evaluations. To understand how best to address future CHDL evaluations, it is critical to understand what approaches have been used in the past. While a majority of the evaluations were criteria based, many of the criteria were custom designed for individual studies. Additionally, there is overlap between the criteria-based and usability studies, as many of them used usability as a criterion or emphasized usability in their studies (see Table 3).
[End Page 322]
Criteria
Many of the studies included in this analysis did not provide a thorough list of the criteria used in their given study. However, by using Stiller and Petras’s (2018) criteria outline, we have categorized the criteria in the studies based on the language used around the evaluation and its outcomes. We have added one additional criterion to the list, which is cultural content. This can be seen as an extension of the coverage criterion and is relevant in cases where the CHDL has made a point about its focus on cultural content directly relevant to its target population or institution. The categorization that we have created provides a general overview of the priorities of DL evaluations regardless of context, which is helpful for the implementation of future DL evaluation frameworks (see Table 4).
Cultural elements
As this study is focused on CHDLs, we have included an additional table for the cultural content criterion. Table 5 specifies the articles that focus on cultural heritage as well as the cultural element that they prioritize. Additionally, all of the information gathered during this review has been aggregated and is presented alphabetically by author surname in Appendix 1.
Saracevic’s (2000) framework has allowed us to analyse a number of CHDL evaluations as well as to identify where gaps may occur that should be addressed. Through our analysis of 59 evaluation studies, we have identified a number of gaps in the identified CHDL evaluations, some of which are unique to cultural [End Page 323] heritage institutions and some of which could be applicable to both DLs and CHDLs:
• The identified studies strongly lean towards well-known CHDLs, such as CULTURA and Europeana, which provides a limited view of CHDL evaluation since it may be missing out on important conversations by privileging certain institutions.
• There is a strong focus on user-centred perspectives. While the user is a critical component of CHDLs, especially in an evaluative capacity, it once again provides a lack of other perspectives.
• Many evaluation studies suffer from a lack of description in both their methodologies and in their criteria, making it difficult to fully analyse, re-use, or replicate studies.
• In terms of cultural heritage, many CHDL evaluations fail to mention or account for culture in any capacity. Few studies have talked about community collaboration for a specific cultural group or the creation of a culturally driven framework.
Although it is clear that there have been large-scale evaluations of CHDLs, they are often evaluated in the same way as DLs, which is void of the cultural element. Not every CHDL study may need to have a culturally driven evaluation framework; however, the importance of culture and community should not be entirely ignored as it is in most of these studies. This is particularly important when speaking of DLs made for historically disenfranchised communities, which have often been excluded from academic narratives about their own culture. In these cases, community engagement and cultural awareness are critical not only for the library’s development but also for any future evaluations, whether that be an analysis of community satisfaction, the collection size of cultural material, or the presence of items in local languages and dialects. No evaluative framework for cultural [End Page 324] content in a digital environment has been determined as of this study, and, in the following sections, we make some recommendations for possible inclusions.
Guidelines and recommendations
It has been argued that simplicity is a fundamental principle of building search interfaces (Aula and Käki 2005; Buttenfield 1999). This is why interface analysis from the user perspective is one of the most prevalent methods of CHDL evaluation identified in this study as well as one of the primary focuses of the human-computer interaction field. The guidelines and recommendations reported here are of great importance to those involved in the creation and study of CHDLs since they are intended to offer alternatives to current CHDL iterations as well as to reveal problems with prior evaluative work and considerations for future evaluations and evaluation frameworks (Gaona-García, Martin-Moncunill, and Montenegro-Marin 2017). While the included guidelines centre on cultural knowledge management as opposed to culturally appropriate evaluation methods, our belief is that these two concepts are intertwined. Knowledge management and evaluation are reciprocal processes, and if more CHDLs begin to work closely with relevant populations during testing and development stages, there will be a greater drive to account for these contexts and populations during later evaluations. CHDLs should not just inform communities but be informed by them. Our guidelines for more effective knowledge management in cultural heritage repositories, inspired by DL interface guidelines created by Gaona-García, Martin-Moncunill, and Montenegro-Marin (2017), are as follows:
• include the cultural community in CHDL creation, including working with the community to identify what is important to them in a cultural heritage setting and implementing those findings where possible (that is, culturally appropriate and sensitive metadata, coverage of specific concepts and events, and locally tested interface);
• define methods for linking data objects through cultural topics or knowledge areas;
• define an enriched culturally relevant language within the appropriate knowledge representation scheme to facilitate mapping processes to external ontologies hosted in other cultural heritage repositories;
• Use linked data processes to enable interoperability between heterogeneous CHDL repositories;
• provide an easily navigable interface to allow users to gain an overview of the cultural areas of expertise or interest represented; and
• integrate metadata with site navigation to make thematic site exploration easier.
As the alteration of CHDL management practices will not necessarily affect CHDL evaluation frameworks, we also propose a number of recommendations to consider before undertaking future evaluations of CHDLs and other cultural heritage repositories. These recommendations are in response to the current lack of culturally driven CHDL evaluation and include the following: [End Page 325]
• include the cultural community in CHDL testing and evaluation;
• identify the number of relevant digital resources retrieved in each query process;
• examine the relevance of metadata attached to digital resources retrieved in each query process;
• track the usage of digital resources according to their use case—academic, scientific, public, and targeted cultural group;
• identify the number of associated digital resources for a cultural topic or thematic area; and
• identify the number of multilingual digital resources as well as the system’s multilingual search capabilities.
Conclusion
This analysis and critique of 59 CHDL evaluation studies provides only a brief overview of the variety of evaluative frameworks, methods, and criteria that have been presented and adapted over the years. While all of the studies included in this article have been helpful in the formation of this overview, they have also shed light on the shortcomings of the current field of evaluative literature in CHDL. Namely, the lack of a cultural component in many CHDL evaluations seems like a vast oversight by many. The adoption of general DL evaluation frameworks, much like the one used in this study, has been part of the problem, as these pre-formulated frameworks were not intended for a cultural heritage audience. This does not mean that these formative frameworks must be scrapped but, rather, that adjustments and adaptations need to be made. For example, what is the involvement of the cultural community in the creation and testing of the CHDL? Is that information being reported? Are cultural communities being included in the evaluative process? Are their needs being met, whether that is through coverage, interface design, or metadata vocabulary? Ultimately, we are strongly recommending the inclusion of specific cultural components to future CHDL evaluations in order to solidify their distinction from other DLs.
evillanu@ualberta.ca
ashiri@ualberta.ca
Note
1. Europeana, https://www.europeana.eu/en (accessed December 15, 2020); CULTivating Understanding and Research through Adaptivity, http://www.cultura-strep.eu/ (accessed December 15, 2020).
References
Appendix 1. Summary of the meta-analysis of the 59 evaluations studied
Authors and year of source | Name | Framework/Approach | Method | Criteria | Perspective | Data type | User centred | Interface | System centred |
---|---|---|---|---|---|---|---|---|---|
Abdallah et al. (2017) | British Library, CHARM, Mazurka, I Like Music | Digital Music Library Framework | Criteria-based study | Usability, data quality, performance evaluation | User | Quantitative | x | x | |
Agosti et al. (2013) | CULTURA | Development of a new adaptive and dynamic environment, IPSA Cultura | Usability study | Usability, accessibility | User | Qualitative | x | x | |
Agosti, Orio, and Ponchia (2014) | CULTURA | Personalized embedded narratives/guided collection tours | Usability study | Accessibility, usability, user satisfaction | User | Qualitative | x | ||
Agosti, Orio, and Ponchia (2018) | CULTURA | Interaction Triptych Model | Usability study | Usability, user satisfaction, performance evaluation | Expert/user | Qualitative | x | x | x |
Aletras, Stevenson, and Clough (2012) | Europeana | Knowledge-based and corpus-based approach | Criteria-based study | Accessibility, data quality, performance evaluation | Expert | Quantitative | x | x | |
Anderson (2007) | The Glasgow Story | Impact assessment | Impact study | Impact criteria, usability, user satisfaction, usage statistics and patterns, accessibility | Expert/user | Both | x | ||
Bailey et al. (2012) | Trinity College Dublin, University of Padua, CULTURA | Experimental approach | Log file analysis | Usage statistics and patterns | Expert/user | Quantitative | x | ||
Bonacini (2019) | #iziTRAVELSicilia | Participatory community approach | Impact study | Impact criteria, user satisfaction | User | Quantitative | x | ||
Borgman et al. (2001) | Alexandria Digital Earth ProtoType | Bottom-up approach to DL design | Usability study | Usability, data quality, coverage | Expert/user | Qualitative | x | ||
Bow (2019) | The Living Archive of Aboriginal Languages | Knowledge infrastructure through a socio-technical lens | Usability study | Usability, user satisfaction, coverage, cultural content, accessibility | Expert/user | Qualitative | x | ||
Candela, Escobar, and Marco-Such (2017) | Biblioteca Virtual Miguel de Cervantes | Biblioteca Virtual Miguel de Cervantes ontology | Usability study | Accessibility, data quality, usability, user satisfaction | User | Both | x | x | x |
Crane and Wulfman (2003) | Perseus digital library, National Science Digital Library | Neglect versus need, new domains versus disciplinarity, using versus creating digital collections | Log file analysis | Usage statistics and patterns | Expert | Quantitative | x | x | |
Dobreva and Chowdhury (2010) | Europeana | TIME Evaluation Framework | Usability study | User satisfaction, coverage, usability | User | Qualitative | x | x | x |
Dorward, Reinke, and Recker (2002) | SMETE Open Federation Digital Library | Instructional Architect | Usability study | Usability | Expert/user | Qualitative | x | x | |
Farnel et al. (2017) | Digital Library North | Community-driven metadata framework | Usability study | Cultural content, coverage, accessibility, usability, data quality | User | Qualitative | x | x | |
Feinberg (2013) | Personal digital collection (that is, Pinterest, YouTube playlists) | Comparative appraisal | Criteria-based study | Data quality, cultural content, coverage | User | Both (mainly qualitative) | x | x | |
Fenlon (2013) | Digital Public Library of America | Multimodal pilot study | Criteria-based study | Data quality, error rate, performance evaluation | Expert | Qualitative | x | ||
Freire, José, and Calado (2012) | Europeana | Conditional random field models | Criteria-based study | Data quality, error rate, performance evaluation | Expert | Quantitative | x | ||
Galani and Kidd (2019) | With New Eyes I See, Rock Art on Mobile | Multimodality and reflexivity | Impact study | Accessibility, impact criteria, cultural content, user satisfaction | User | Both (mainly qualitative) | x | ||
Goodale (2016) | Europeana | PATHS system | Criteria-based study | Usability, user satisfaction | Expert/user | Both | x | x | |
Goodale et al. (2014) | Europeana | PATHS system; Interactive Information Retrieval Evaluation Framework | Log file analysis | Usage statistics and patterns, usability, user satisfaction | Expert/user | Both | x | x | |
Gordea (2014) | Europeana | Image similarity search service | Criteria-based study | Performance evaluation, data quality | Expert | Quantitative | x | ||
Hall et al. (2014) | Europeana, Wikipedia | Evaluation of four key activities that hierarchies support | Usability study | Data quality, coverage | User | Both | x | x | |
Hampson et al. (2014) | CULTURA, Biodiversity Heritage Library, Bayerische StaatsBibliothek, Europeana, Rijksmuseum cultural resources, MOSAICA project | Metadata-enhanced exploration. | Usability study/criteria-based study | Usability, user satisfaction, accessibility | User | Qualitative | x | ||
Harris et al. (2018) | Search and Mining Tools with Linguistic Analysis system | Probabilistic approach | Criteria-based study | Accessibility, data quality, performance evaluation | User | Quantitative | x | x | |
Harsányi, Rozinajová, and Andrejčíková (2012) | Art museums; national libraries of several countries | Semantic interoperability | Criteria-based study | Data quality | Expert | Qualitative | x | ||
Hill et al. (2000) | Alexandria Digital Library | Geolibrary | Criteria-based study | Usability, accessibility | User | Both | x | x | |
Hu, Ho, and Qiao (2017) | Mogao Cave Panorama Digital Library | Framework of evaluation criteria includes the aspects of effectiveness, efficiency, satisfaction, and interactivity | Usability study/criteria-based study | Usability, user satisfaction | Expert/user | Both | x | ||
Hu, Ng, and Xia (2018) | General cultural heritage work in China (references Mogao Cave) | User-centered design, information representation design, and grounded theory | Criteria-based study | Usability, data quality, accessibility | Expert/user | Qualitative | x | x | |
Hug and Gonzalez-Perez (2012) | Institute of Heritage Sciences (Incipit) at the Spanish National Research Council | Incipit information system design framework | Criteria-based study | Data quality, usability | Expert | Qualitative | x | x | |
Ibrahim and Ali (2018) | Malay house virtual heritage environment | (1) Information design; (2) information presentation; (3) navigation mechanism; and (4) environment setting | Criteria-based study | Usability, cultural content, user satisfaction | Expert/user | Both (mainly qualitative) | x | x | |
Jeng (2008) | New Jersey Digital Highway | Usefulness assessment; Technology Acceptance Model | Criteria-based study | Usability, user satisfaction | Expert/user | Qualitative | x | x | |
Komlodi, Caidi, and Wheeler (2004) | Biblioteca Italiana, Early Canadiana Online, Gallica, Library of Congress National Digital Library, Neumann House, Proyecto Biblioteca Digital Argentina | Four usability design guidelines: language and visual representations, content selection, and content organization | Criteria-based study | Usability, coverage, accessibility, cultural content | User | Qualitative | x | x | |
Liew (2005) | Ranfurly Collection, Niupepa Collection | Cross-cultural usability | Usability study | Usability, accessibility, data quality, cultural content, coverage | User | Qualitative | x | ||
Liew and Chowdhury (2016) | New Zealand Electronic Text Collection, Kete Horowhenua, New Zealand History Online | Economic sustainability, social sustainability, and environmental sustainability | Criteria-based study | Cultural content, coverage, accessibility, usability | User | Qualitative | x | ||
Marchionini (2000) | Perseus digital library | Evaluation as a research and problem-solving endeavour | Impact study | Impact criteria | User | Both | x | ||
Marketakis et al. (2017) | Research Space, ARIADNE | Synergy Reference Model and the X3ML mapping definition language | Criteria-based study | Performance evaluation | Expert | Quantitative | x | ||
Matthews and Aston (2012) | Wendy James’s anthropological archive | Narrative-based approach | Criteria-based study | Accessibility, usability, data quality | User | Qualitative | x | x | x |
Melucci and Orio (2004) | Digital Archive for the Venetian Music of the Eighteenth Century | System for Music Information Retrieval Environments | Log file analysis | Data quality | Expert | Quantitative | x | ||
Núñez and Repiso (2019) | 87 digital collections of cultural heritage of the Canary Islands | Interoperability | Criteria-based study | Data quality, accessibility, coverage, cultural content | Expert | Both | x | x | |
Oomen et al. (2013) | Netherlands Institute of Sound and Vision | TREC Video Retrieval Evaluation | Log file analysis | Usage statistics and patterns, user satisfaction | User | Both | x | x | |
Pallas and Economides (2008) | 210 web sites of art museums from all over the world | Museum’s Sites Evaluation Framework | Criteria-based study | Usability, data quality, user satisfaction, coverage | Expert | Both | x | x | x |
Pattuelli (2011) | Library of the University of North Carolina at Chapel Hill (Tobacco Bag Stringing) | Development of the Tobacco Bag Stringing ontology followed Methontology as the general methodology framework | Usability study | User satisfaction, usability, accessibility | User | Qualitative | x | x | |
Punzalan, Marsh, and Cools (2017) | American History and Anthropology Museums | Toolkit for the Impact of Digitised Scholarly Resources, and Archival Metrics. | Impact study | Impact criteria | Expert/user | Qualitative | x | ||
Shiri (2018) | Digital Library North | Multi-disciplinary participatory methodological framework | Criteria-based study | Cultural content, coverage, accessibility, usability | Expert/user | Qualitative | x | ||
Shiri and Stobbs (2018) | Digital Library North | Culturally aware, multi-method and multidisciplinary user evaluation framework | Usability study/criteria-based study | Usability, user satisfaction, cultural content, accessibility | User | Qualitative | x | x | |
Skevakis et al. (2014) | Natural Europe project | Natural Europe Cultural Environment (NECE) | Usability study | Usability, data quality, user satisfaction | Expert | Qualitative | x | x | |
Steiner et al. (2014) | CULTURA | Interaction Triptych Model | Usability study | Usability, user satisfaction, data quality, coverage, performance evaluation | Expert/user | Both | x | x | x |
Stiller (2014) | Brooklyn Museum, British Library, Nationaal Archief, Europeana, Historypin, International Children’s Digital Library | Grounded theory approach; information interaction framework | Criteria-based study | Accessibility | Expert | Both | x | x | |
Stiller, Gäde, and Petras (2013) | Europeana | Multilingual access in CHDL | Criteria-based study | Accessibility | Expert | Qualitative | x | x | x |
Suire et al. (2016) | Experimental corpus of 240 cultural heritage documents | Marchionini’s framework | Log file analysis | Usage statistics and patterns | User | Quantitative | x | x | |
Sulé Duesa, Rius, and García (2011) | 31 Spanish heritage repositories | Alvite Diez’s cultural heritage evaluation | Criteria-based study | Usability, coverage, user satisfaction, data quality | Expert | Qualitative | x | x | x |
Szabo, Lacedelli, and Pompanin (2017) | Dolom.it, the virtual museum of Dolomites landscape | Interpretative framework | Impact study | Impact criteria, coverage, cultural content | User | Qualitative | x | ||
Van den Akker et al. (2013) | Agora | Digital hermeneutics | Criteria-based study/usability study | Usability, coverage, cultural content, accessibility, data quality | Expert/user | Qualitative | x | x | |
Van Hooland et al. (2013) | Powerhouse Museum in Sydney | Linked data approach | Criteria-based study | Data quality | Expert | Both | x | ||
Vila-Suero et al. (2019) | Online Public Access Catalogues (catalogo.bne.es, datos.bne.es) | Semantic web technologies/linked data | Log file analysis/usability study | Usability, user satisfaction, performance evaluation, error rate | Expert | Both | x | x | x |
Wang et al. (2013) | Europeana | Three part framework: (1) fast clustering; (2) hierarchical structuring; and (3) focal semantic clusters | Criteria-based study | Data quality, performance evaluation | User | Both | x | x | |
Xie (2006) | Library of Congress American Memory Project, ACM Digital Library, Electronic Poetry Center at State University of New York, Buffalo | User-centered approach | Criteria-based study | Accessibility, usability, coverage, user satisfaction, performance evaluation | User | Both | x | x | x |
Yelmi, Kuşcu, and Yantaç (2016) | Soundscape of Istanbul project, Soundsslike project | User-centered design approach | Criteria-based study | Accessibility, data quality, cultural content | User | Qualitative | x | x |