Einzeltreffer — DigiBib

There is strength in diversity. This article illustrates this through the various open-source software solutions for data repositories in scientific institutions in the Munich area. Within close geographic proximity, various software solutions serve as the foundation for these repositories. The text presents the software utilized in their respective usage scenarios, along with the applied performance features. Consequently, the following pages are intended to offer guidance to other scientific institutions seeking suitable software for their research data repositories, extending beyond Munich. [ABSTRACT FROM AUTHOR]

In der Vielfalt liegt eine Stärke. Der Beitrag demonstriert dies anhand der unterschiedlichen Open Source-Software-Lösungen für Datenrepositorien an wissenschaftlichen Institutionen in der Region München. Hier werden auf engem geographischen Raum diverse sich stark unterscheidende Software-Lösungen als Basis für Repositorien verwendet. Dabei handelt es sich nicht um einen Selbstzweck. Der Text präsentiert die angewandte Software in den jeweiligen Nutzungsszenarien sowie mit den eingesetzten Leistungsmerkmalen. Hierdurch bieten die folgenden Seiten auch über München hinaus eine Orientierung für andere wissenschaftlichen Einrichtungen bei der Auswahl einer geeigneten Software für ein Forschungsdatenrepositorium. [ABSTRACT FROM AUTHOR]

RDMUC: Various Approaches to Research Data Repositories in Munich RDMUC: Unterschiedliche Ansätze für Forschungsdatenrepositorien in München

In der Vielfalt liegt eine Stärke. Der Beitrag demonstriert dies anhand der unterschiedlichen Open Source-Software-Lösungen für Datenrepositorien an wissenschaftlichen Institutionen in der Region München. Hier werden auf engem geographischen Raum diverse sich stark unterscheidende Software-Lösungen als Basis für Repositorien verwendet. Dabei handelt es sich nicht um einen Selbstzweck. Der Text präsentiert die angewandte Software in den jeweiligen Nutzungsszenarien sowie mit den eingesetzten Leistungsmerkmalen. Hierdurch bieten die folgenden Seiten auch über München hinaus eine Orientierung für andere wissenschaftlichen Einrichtungen bei der Auswahl einer geeigneten Software für ein Forschungsdatenrepositorium.

There is strength in diversity. This article illustrates this through the various open-source software solutions for data repositories in scientific institutions in the Munich area. Within close geographic proximity, various software solutions serve as the foundation for these repositories. The text presents the software utilized in their respective usage scenarios, along with the applied performance features. Consequently, the following pages are intended to offer guidance to other scientific institutions seeking suitable software for their research data repositories, extending beyond Munich.

Keywords: Forschungsdaten; Forschungsdatenmanagement; Repositorien-Software; research data; research data management; repository software

1 Introduction

Munich is one of the world's leading regions for science and innovation, including the production and management of research data. The Munich Working Group for Research Data Management is an example of how many of the local scientific institutions meet regularly to exchange information on current developments. This professional exchange between institutions has developed into a sustainable and established network at the local level from which all participants benefit.

In this article, members of the RDMUC present an overview of various research data repositories in the Munich area. However, the aim of the article is not only to portray the research data infrastructure in the Bavarian capital. It also shows the high number and diversity of open-source solutions for research data repositories. There are already some comparisons of repository software tools, but none with such a regional focus.

Graph: Fig. 1: Repositories provided by RDMUC members. From top left to bottom right: MPCDF (https://metastore.mpcdf.mpg.de), Edmond (https://edmond.mpdl.mpg.de/), Open Data LMU (https://data.ub.uni-muenchen.de/), UB Discover (https://discover.ub.uni-muenchen.de/), GIN (https://gin.g-node.org/), LRZ LTDS InvenioRDM (https://www.lrz.de/forschung/projekte/forschung-daten/), mediaTUM (https://mediatum.ub.tum.de/) and Publikationen im Netz (https://publikationen.badw.de/). For detailed descriptions see section 3 of main text. All accessed January 09, 2024.

The structure of the article is designed to demonstrate the benefits and motivations of such diversity, and to introduce other data professionals and initiatives to the different software solutions that can be used. To this end, the individual repositories and their institutions are introduced. Then two tables provide a systematic overview of the different technical and organizational approaches. The overview will make clear that there is no single software solution for all purposes, but rather many options that can be tailored to specific needs and use cases.

2 Data Repository Definition

A data repository is a web application that stores and publishes research data and its metadata according to a well-defined policy, typically including the FAIR principles. Research data is data that:

a) is produced as the result of a scientific research process, or

b) serves or is required to justify or critically discuss a scientific hypothesis, analysis, or other research result, or

c) can be used as a basis for subsequent scientific approaches.

Information about research data should be stored in both human-readable and machine-readable form. Most repositories offer one or more common metadata schemas (e. g., Dublin Core) for FAIR-compliant metadata annotation. Non-standard metadata handling is also possible, but lacks support for automated processing. Stored metadata can typically be retrieved via an API using standardized protocols like OAI-PMH or ResourceSync. Often a data repository offers the possibility to add a persistent identifier (PID, e. g. DOIs) to the metadata. The PID allows the dataset to be found even if it moves to a different URL.

Due to the limitations of modern web protocols, large datasets (terabytes to petabytes) cannot be uploaded or downloaded using web browsers or similar tools. While it is essential to store the metadata in the repository, the (large) object data it refers to may be stored elsewhere. In this case, the metadata must contain a reference to the corresponding data.

Generic repositories such as Zenodo have a broader approach and can store any kind of scientific data and publications. The repositories of the RDMUC institutions mentioned in this paper are tailored to the types of research-data and publications produced in the respective institution. This narrower approach allows the typical use cases of research data in specific scientific disciplines to be accommodated.

3 Research Data Repositories in Munich

3.1 CKAN: Max Planck Computing and Data Facility (MPCDF)

The MPCDF stores more than 300 petabytes of research data, created and collected by researchers from a wide range of disciplines. The data is distributed across a heterogeneous landscape of storage systems, including tape libraries, POSIX file systems and S3-compliant object storage. Many of these datasets are "immobile," i. e. they cannot be moved via standard methods.

To enable researchers to make their data FAIR and publicly available, the MPCDF has developed a flexible data repository hosting concept based on CKAN (Comprehensive Knowledge Archive Network). CKAN is a Python-based data management software developed by the Open Knowledge Foundation. Its main feature is the creation and management of metadata for accessing and publishing datasets. A dataset is described by its metadata and may contain one or more resources. A resource can be an uploaded dataset or URIs pointing to the actual object data.

CKAN can be administered either through a web browser or through an extensive REST API, which allows workflow automation and integration of the repository with other external functionalities and tools. This includes nearby services such as MPCDF's BinderHub service for executing Jupyter notebooks, as well as remote services such as integration with the MPDL's DOI service.

The MPCDF offers the hosting of CKAN instances to the institutes of the Max Planck Society. The MPCDF provides the basic infrastructure: an up-to-date virtual machine with regular backups, a basic CKAN installation with support for common plug-ins, general configuration, and the integration of the required storage. The institutes are responsible for any further customization, the user management and content repository content, while MPDCF is responsible for ongoing support, system administration, monitoring and maintenance.

This concept of "shared responsibility" between the MPCDF and the Max Planck Institutes is now being adopted by four Max Planck Institutes that already operate a CKAN-based data repository at the MPCDF. These include different scientific disciplines such as archeology, physics (dark matter research and physics of light) and material research (polymer research). An additional "catch all" instance of CKAN is currently being planned for use by researchers of institutes that do not operate a repository.

3.2 Dataverse: Edmond (MPDL)

Edmond is the open data repository of the Max Planck Society (MPG) which has been operated by the Max Planck Digital Library (MPDL) since 2014. The repository offers the possibility to store complete research datasets from all MPG disciplines under an open access license, thus creating citable research objects. The repository is named after the English astronomer Edmond Halley (1656–1742) to honor the very first dataset on Edmond.

Edmond can be used by all researchers and information specialists of the Max Planck Society using the MPG single sign-on. In addition, collaboration partners outside the MPG can register locally and receive an account.

Edmond has been running on Dataverse since February 2022. At the same time, new features such as versioning, automatic DOI reservation, and an improved user interface were made available, while the S3 storage of the GWDG data center is used.

To support the open-source idea of Dataverse, the MPDL contributes to the improvement of the code. An example is with a ZIP viewer, that was released in September 2022 to the community and allows the user to directly view zipped files and to extract e. g., a readme file without having to download the entire ZIP folder.

Another example is the development of an interface to the Photon and Neutron Open Science Cloud (PaNOSC). HDF5 files that are stored in Edmond can be previewed in PaNOSC's H5Web via the interface and users can then decide whether the data is of interest to them and download it. Edmond was activated for MyBinder with the same idea in mind and since then, datasets can be loaded into MyBinder running Jupyter Notebooks. Code and data can be immediately viewed and used interactively there, increasing the reproducibility.

Edmond provides a connection to GitHub via the Dataverse software: code published on GitHub can be pushed to Edmond and get automatically published there. This feature increases the sustainability and recognition of research software.

Users can describe their datasets with a variety of metadata. Edmond provides auto-suggestions for ROR and ORCID identifiers to support the systematic entry of certain data types. In addition, a DOI is automatically assigned to every published dataset in Edmond. Elsewhere, b2find has been enabled to harvest Edmond's metadata for display in the European Commission's OpenAIRE Explorer.

Edmond can be accessed via a web browser or via Edmond's REST API. In parallel, there is a DVUploader, a Java8-based command-line client for uploading a large single file, and the MPDL-developed "UpVerse" application for uploading large numbers of files.

3.3 EPrints: Open Data LMU

At Ludwig-Maximilians-Universität (LMU) in Munich, the institutional research data repository „Open Data LMU" has been in operation since 2010. From a technical point of view, Open Data LMU is based on the open-source software EPrints, which is also used by the LMU University Library for a publication server and dissertation server.

The repository allows LMU members and associated researchers to deposit their research data within an institutional framework. All processes, from data submission to publication and long-term storage, take place within the server infrastructures of the University Library and, to ensure best practices in backup and archiving, also in cooperation with the Leibniz Supercomputing Centre.

With 18 faculties, the University Library supports a wide range of use cases and subject-specific needs for research data. In order to make a service like Open Data LMU available for this diverse group of users, interdisciplinary aspects of the metadata fields are necessary. Therefore, in addition to general metadata (such as title, contributor, abstract, language, license), a faceted search is possible using institution-specific extensions such as faculties and DDC. Open Data LMU also allows versioning of datasets as well as access management for specific files or datasets. All new uploads go through a formal curation process before they are published.

Metadata is stored in the EPrints database and is accessible through an OAI-PMH API. Each published item is assigned a Digital Object Identifier (DOI). Open Data LMU has no restrictions on file formats and offers solutions for large datasets. To date, 177 datasets have been published, ranging from single tabular data files of only a few KB to large datasets with up to triple-digit GB files, including high-resolution 3D-scans or raw medical data.

Although EPrints is a stable and reliable software for both users and administrators, some aspects, such as the support and interoperability of authority records and persistent identifiers, are not always executed to their full potential. In accordance with the FAIR principles and with a view to more complex requirements related to the presentation and storage of research data, the University Library is currently developing the new Discover platform, which will be discussed in the next section.

3.4 Fedora/Project Blacklight: Discover [LMU]

In order to expand the research data services already offered with Open Data LMU and to enable users to also discover research data and other LMU research outputs, the Discover platform was introduced in 2019. In 2023, all digital collections of the University Library were integrated into the Discover architecture. In the future, any data object that meets the criteria for publication in the institutional repository will be stored directly on Discover. The platform combines a variety of software and technology stacks. The foundation of Discover as a research data repository is the open-source software Fedora (also known as Fedora Commons or Fedora Repositories). Fedora stands for "flexible, extensible digital object repository architecture" and is often considered middleware and does not include all common repository features such as discovery services or DOI miniting by default. It was originally developed by DuraSpace (the same organization that provides DSpace) and is now part of the larger portfolio of the non-profit organisation LYRASIS. The main purpose of Fedora is to store data and metadata and to ensure that the data is accessible and valid for the long term.

For the productive repository solution at LMU, several additional software components are assembled in the Discover architecture to optimize the functionalities: The Solr Index and Apache Camel Routes components allow the organization of the metadata associated with the research data. The open-source component Project Blacklight provides a good and modern system of a discovery interface for the metadata in the Solr Index, presenting the research data and their associated metadata to the public. Discover can be seen as a further development of the already available EPrints-based Open Data LMU repository and will supersede the repository in the long run, so that only one central service is available to the researchers. Discover has the advantage of being highly adaptable and able to support new and upcoming developments to include more detailed metadata and to support different authority records and persistent identifier systems.

Finally, Discover's modular approach will allow the University Library to be highly flexible in expanding the resources added to Discover in the near future. Currently, research data lying in Fedora is stored directly on servers provided by the University Library. In addition, external resources can be connected through OAI-PMH APIs. This allows the integration of metadata from data stored in other LMU-related repositories, e. g. Open Data LMU Physics.

3.5 GIN: German Neuroinformatics Node (G-Node, LMU)

The G-Node research data infrastructure service, GIN, is a research data platform for neuroscience, hosted at the LMU Faculty of Biology. G-Node was established in the context of Germany's participation in the International Neuroinformatics Coordinating Facility, funded by the BMBF, to provide neuroinformatics tools and services for neuroscience research.

GIN aims not just at dataset hosting but also at supporting research data management and collaboration throughout the data lifecycle. To this end, GIN provides repository management with version control and collaborative features, including fine-grained access control, issue tracking, repository forking, pull requests – similar to services for collaborative software development like GitHub or GitLab, but tailored to the needs of research data management.

GIN is built on Gogs, an open-source Git repository service. GIN extends the functionality of Gogs by support for git-annex, which provides efficient version control for large data files and minimizes data transfers. Further functionality is added through several flexible and extensible microservices, including file content indexing, data format validation, and DOI registration. A continuous integration service for research data is currently at an experimental stage.

Interaction with GIN is provided through a variety of interfaces, including a convenient web interface, a command-line client tool that enables integration of data management routines into everyday data workflows, and a REST API. Users can also work directly with git and git-annex. Moreover, GIN is fully compatible with the data management tool DataLad, which can be used as a client and supports management of distributed datasets.

GIN supports scientific work at various levels, including reproducible research data management, data sharing in collaborative projects, and data publication. GIN is recommended by publishers as a data repository for neuroscience data. For data publication, users provide metadata according to the DataCite schema in a specific file in the repository and request a DOI with the click of a button. The publication process is largely automated, but includes a curation step to ensure that the repository meets requirements for FAIR data. Seamless integration of publication encourages considering data publication not as an add-on but as anintegral part of scientific practice.

The GIN software is entirely open-source, with docker containers available for easy installation. In this way, GIN can be installed locally for research data management and collaboration within a lab or institute. While the GIN platform at LMU is domain-specific, the GIN software as such is generic and can be used for any research field.

3.6 InvenioRDM: Planned HPC RDM Service at LRZ & Open Data LMU Physics

The Leibniz Supercomputing Centre (LRZ) plans to introduce additional RDM service components for storing and publishing data at the LRZ. This service will not implement a classical data repository, but will allow a direct publication of datasets generated and stored at the LRZ. The service will combine a versatile backend component for semi-automatic metadata collection with a data-portal frontend based on InvenioRDM, a software also used by the Zenodo repository. The viability of this approach, including customization of the InvenioRDM instance, has already been demonstrated with the standalone RDM portal Open Data LMU Physics serving the needs of the LMU Faculty of Physics.

This LRZ RDM service targets large datasets on LRZ storage systems that are difficult to move to other repositories or locations. Typical sizes of such datasets, mainly from High Performance Computing (HPC) simulations, are in the range of 100 TB to several PB. The new service will enable and encourage users to publish such data (with an appropriate set of metadata) via the InvenioRDM-based portal. Access to the data itself is achieved, for example, by presenting a Globus Sharing link with the metadata.

While InvenioRDM offers comprehensive repository system capabilities, here it is actually only used for metadata publication, where metadata sets are ingested via InvenioRDM's REST API. In particular, the portal component is used in this context to facilitate the search and presentation of datasets, and thus the functionalities for individual user login and data up-/download have been disabled. The system works with a set of metadata that essentially conforms to the DataCite schema and is capable of minting and assigning DOIs. InvenioRDM facilitates versioning and thus updating of the registered datasets. The search functionality includes full-text search as well as the possibility of very versatile advanced search queries. For metadata harvesting by aggregators or search engines, InvenioRDM provides an OAI-PMH interface. All in all, this approach guarantees the publication of datasets according to the FAIR principles.

To publish data from an LRZ storage resource, users first register their storage resource with the LRZ RDM system. Then they store DataCite metadata besides each dataset to be published as a YAML "sidecar" file. The backend component of the LRZ RDM system crawls the registered storage resources for these files and publishes the validated metadata by pushing it to the InvenioRDM API. The system is envisaged to allow automatic extraction of metadata from datasets (e. g. file names or file contents) at a later stage. Starting soon with a few "user friendly" cases (demonstrator stage), a full service shall be offered to larger and larger classes of LRZ users over time.

3.7 mediaTUM: TUM's Open-Source Solution for institutional Repository

mediaTUM is the repository of the Technical University of Munich (TUM). The name mediaTUM is also used for the software itself. Scientific publications such as journal articles, conference papers, research data, university publications, image and video collections as well as valuable scientific holdings are published and archived in mediaTUM.

The DFG-funded project IntegraTUM ran from 2004 to 2009 and aimed to create a user-friendly and seamless infrastructure for information and communication at the TUM.

One part of this multitasking project was to create a platform that would allow researchers to create and publish bibliographic records themselves. Since the software available at the time could not fulfill all the requirements, it was decided to develop mediaTUM as in-house software.

Today mediaTUM is a multifunctional repository software. It allows publishing, indexing, managing, searching and distributing publications with and without full text.

The publication of research data is a curated process supported by the library. Metadata following the DataCite schema is used to describe the data in a FAIR way. The metadata contains the link to the storage system where the actual dataset can be downloaded via FTP, rsync or HTTP. There are no restrictions on data size, format or structure. mediaTUM is characterized by a flexible rights management system, in which access rights can be restricted to individuals or a user group, defined down to the level of individual objects.

The publication process of dissertations, and bachelor and master theses is controlled by a workflow engine. This includes editing rights for involved parties at different stages of the process and communication via automatic e-mail transmission. The data stored in mediaTUM can be easily integrated into other applications via a REST interface and can be displayed in other search platforms such as OpenAIRE, BASE, DNB and Open bydata.

With the help of a JavaScript export script it is possible to export publication lists to websites.

mediaTUM is not only used by the TUM, but also by AtheneForschung at the University of the Federal Armed Forces in Munich and the Catholic University of Eichstätt-Ingolstadt as a publication platform for digital image and film collections.

3.8 Geist at BAdW: Open Access Publication Repository of the BAdW

Geist is the name of the framework used for the central publication platform of the Bavarian Academy of Sciences and Humanities (BAdW). It is also used for several other data repositories of the BAdW and the Max-Weber-Stiftung.

Geist differs from other publication or data repository systems in several ways, reflecting the specific conditions of the BAdW as a scientific institution: first, the vast majority of BAdW's research projects are in the humanities rather than the natural sciences, which means that most of the material is primarily intended for human readers rather than for machine consumption – although this is acknowledged as a secondary goal. In particular, the output of the scholarly society ("Gelehrtengemeinschaft") consists entirely of research papers (called "Sitzungsberichte" and "Abhandlungen"). Thus, the Geist system began as a publication platform and only later evolved into a research data platform.

Second, most digital publications are produced by the Digital Humanities (DH) Division of the BAdW or received by the DH Division as secondary publications that have already been published elsewhere and are republished as Open Access on publikationen.badw.de. In both cases, the material is directly uploaded to the platform via ftp and according to well-defined workflows. Bibliographic metadata is imported directly from B3Kat, which removes the need to enter publications more than once and eliminates the need for a user interface to enter bibliographical records. This simplifies the design of the system and reduces the effort required for its long-term maintenance.

Tab. 1: Organizational Details

Features	CKAN at MPCDF	Dataverse at MPDL	EPrints at LMU	Fedora at LMU	GIN	invenioRDM at LRZ	mediaTUM	Geist at BAdW
Metadata schemata and fields	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Workflows for publishing sensitive research data	Yes	No	Yes	Yes	No	No	Yes	No
Rights management (for content)	Yes (via metadata)	Yes	Yes	Yes	Yes	No (only open data is published)	Yes	Yes
ORCID and ROR	Yes (via metadata)	Yes	ORCID	Yes	Yes (ROR limited)	Yes	No	No
DOI assignment	Yes	Yes	Yes	Yes	Yes	Yes	Yes	URL as PiD
Distribution and developer community	Yes (via metadata)	Yes	Yes	Yes	Yes	Yes	Yes	No
Usage statistics	No (only admin level log files)	Yes	Yes	No	No	Yes	Implicit	Implicit
Usage model: Self-service or curation process for publishing	Self Service	Self Service	Self Service with Curation	Curation	Self Service with Curation	Both possible (no curation process at LRZ instance)	Both	Curation
Preservation stategy	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Persons involved (admins, developers, etc) at the institution	Admins, Librarians, Scientists, ...	Admins, Librarians, Scientists, ...	Admins, Developers, Librarians, Scientists	Admins, Developers, Librarians, Scientists	Admins, Developers, Scientists	Admins, Developers, Scientists, possibly Librarians	Scientists, Librarians, Developers	DevOp, stud. Assisstants

Tab. 2: Technical Details

Features	CKAN at MPCDF	Dataverse at MPDL	EPrints at LMU	Fedora at LMU	GIN	invenioRDM at LRZ	mediaTUM	Geist at BAdW
File formats	No limitations	No limitions	No limitations	No limitation	No limitations	No limitation	No limitation	No Limitation
Metadata output formats	Any schema possible via plugin	Dublin Core, DDI, DataCite, DDI HTML Codebook, JSON, OAI_ORE, OpenAIRE, Schema.org, JSON-LD	Any schema possible via plugin	No limitation	Datacite XML, YAML, schema.org microdata	JSON, CSL, DataCite JSON, DataCite XML, Dublin Core XML	BibTex, Dublin Core, Data Cite, OAI, Xmetadissplus, JSON	OAI-PMH/Marc21, BibTeX, RIS, plane text
(Advanced) Search	Yes (Lucene in backend)	Yes	Yes	No	Yes	Yes	Yes	No
(Data) Workflow engine	Views for certain data types	Not integrated	Yes	No	No	No	Yes	No
Extensibility through plugins	Yes	Yes	Yes	No	Microservices	Yes	Yes	Yes
Navigation tree	Yes	No	Yes	Yes	Within Dataset	No	Yes	Navigation table
Hierarchical structuring with folders	Yes	Yes	No	Yes	Yes	No	Yes	No
Versioning	Yes (via metadata)	Yes	Yes	Yes	Yes	Yes	Yes	Yes, internally
Import possibilities	Yes (via API)	Yes	Yes	Yes	Yes	Yes	Yes	No
SWORD Interface	No	Yes	Yes	No	No	No	No	No
OAI Interface	Yes (old plugin)	No	Yes	Yes	No	Yes	Yes	Yes
Freely designable (metadata) templates	Yes	Partly	Yes	No frontend	Yes	No frontend	Yes	No
Customisable surface ("Multi-client capability")	Yes	Partly	Yes	No frontend	Yes	Partly	Partly	No
Duplicate check	No	Yes	Yes	No	Yes	Partly	No	Yes
Programming language	Python	Java	Perl	Java	Go	Python, JavaScript, HTML	Python	Python, JavaScript
Containerisation	Yes (Docker compose)	Soon forthcoming	No	Yes	Yes	Yes	No	No

For persistent identification, Geist uses its own permanent URL scheme, the last part of which is usually similar to the B3Kat number. Because URLs can be made persistent by the institution that owns them, there is no need for resolver-based permanent identifiers such as DOI or URN.

Although the Geist-instance behind publikationen.badw.de serves as a central hub for all (human-readable) publications of the BAdW, only a part of the BAdW's research data is served by a Geist-system. This is due to the fact that specific use cases that require special functionality sometimes have to be addressed.

Although the software of the Geist-system is open-source, it has to be considered as an in-house product which means that no effort is made to promote or distribute this software to third parties using this software. However, the fact that the Geist-system is also used by Max-Weber-Stiftung (MWG) for some of their data publications shows that this restriction does not have to be an obstacle for reuse.

4 Matrix and Comparison

To visualize the result of the presentation of the different repository solutions used by the RDMUC institutions, the tables 1 and 2 provide a more detailed overview and comparison of the software features.

5 Conclusion

In this paper, we have described the diverse research data repository landscape around Munich, as reflected in the RDMUC community. This landscape accommodates the different backgrounds and responsibilities of the participating institutions. Behind the use of different repository frameworks, however, there is a solid common ground for RDMUC: a strong preference for sustainable open-source solutions and an agreement on supporting the FAIR principles. We consider metadata to be as important as the research data itself and provide methods for the persistent identification of all resources. Tools for handling metadata or adding functionality to repositories are often shared within RDMUC. Likewise, RDMUC provides an important forum for us to discuss developments in the rapidly evolving field of research data management.

As a result, the RDMUC community excels in streamlining the organizational and administrative processes, including best practices around our repositories, and in developing common methods to support researchers with our multi-disciplinary teams. It adds considerable value to the "to each their own" approach we have historically had with our repositories, which is still reflected to some extent in the different software products used. As a result, we are now able to collaboratively advise researchers across our institutions on the services that are available. The long-term goal of RDMUC is to provide an increasingly consistent portfolio of research data services in the Munich area.

Footnotes 1 See: https://forschungsdaten.info/fdm-im-deutschsprachigen-raum/deutschland/bayern/projekte-und-netzwerke/netzwerke/rdmuc-muenchner-arbeitskreis-fuer-forschungsdatenmanagement /. Accessed December 06, 2023. 2 For comparisons, see: Bankier, Jean Gabriel, and Kenneth Gleason.Institutional repository software comparison. Paris: United Nations Educational, Scientific and Cultural Organization 2014.https://unesdoc.unesco.org/ark:/48223/pf0000227115. Accessed October 26, 2023.; Castagné, M.Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra. Vancouver: University of British Columbia 2013. https://doi.org/10.14288/1.0075768; Mias, Erika, Kayleigh Roos, and Jason van Rooyen. "Suggesting an Institutional Data Repository for the University of Cape Town." Zenodo 2016. Table 2. Comparison of open-source repository software features. https://doi.org/10.5281/zenodo.263823. 3 Wilkinson, M., M. Dumontier, I. Aalbersberg et al. "The FAIR Guiding Principles for scientific data management and stewardship." Sci Data 3,160018 (2016), https://doi.org/10.1038/sdata.2016.18. 4 See: https://www.dublincore.org. Accessed December 06, 2023. 5 See: https://www.openarchives.org/pmh /. Accessed December 06, 2023; for more details, see: Lynch, Clifford A. "Metadata Harvesting and the Open Archives Initiative." ARL: A Bimonthly Report 217 (August 2001): 1–9, http://www.arl.org/resources/pubs/br/br217/br217mhp.shtml. Accessed December 06, 2023. 6 Haslhofer, Bernhard, Simeon Warner, et al. "ResourceSync: Leveraging Sitemaps for Resource Synchronization." Proceedings of the 22nd International Conference on World Wide Web (2013): 11–14, https://doi.org/10.1145/2487788.2487793. 7 See: https://www.doi.org /. 8 See: European Organization For Nuclear Research, OpenAIRE.Zenodo. CERN 2013. https://doi.org/10.25495/7GXK-RD71. 9 See: https://ckan.org /. Accessed December 06, 2023. See: https://edmond.mpdl.mpg.de /. Accessed December 06, 2023; and Franke, Michael, and Yves Vincent Grossmann.New Data and Code Features in Edmond – Constantly Improving the Institutional Open Research Data Repository for the Max Planck Society. Berlin 2023. https://doi.org/10.17617/2.3526392. See: https://viaf.org/viaf/71400665 /. Accessed December 06, 2023. See: https://doi.org/10.17617/3.f. See: https://dataverse.org. Accessed December 06, 2023. Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen is a joint institution of the Georg-August-Universität Göttingen – Foundation under Public Law and the Max-Planck-Gesellschaft; see https://www.gwdg.de. Accessed December 06, 2023. See: https://github.com/gdcc/dataverse-previewers/pull/9. Accessed December 06, 2023. See: https://www.panosc.eu. Accessed December 06, 2023. See: https://h5web.panosc.eu. Accessed December 06, 2023. See: https://mybinder.org. Accessed December 06, 2023. See: https://edmond.mpdl.mpg.de. Accessed December 06, 2023. The starting point is https://edmond.mpdl.mpg.de/api /. Accessed December 06, 2023. Open Data LMU is powered by EPrints 3 (http://eprints.org/software/), which is developed by the School of Electronics and Computer Science (http://www.ecs.soton.ac.uk/) at the University of Southampton. Both accessed December 06, 2023. See: https://epub.ub.uni-muenchen.de / and https://edoc.ub.uni-muenchen.de /. Accessed December 06, 2023. See: https://data.ub.uni-muenchen.de/view/subjects/index.html. Accessed December 06, 2023. See: https://data.ub.uni-muenchen.de/view/ddc /. Accessed December 06, 2023. See: https://data.ub.uni-muenchen.de/cgi/oai2 (includes Dublin Core and – through a custom plugin – also DataCite). Accessed December 06, 2023. See: https://fedora.lyrasis.org / and https://github.com/fcrepo/fcrepo. Accessed December 06, 2023. See: https://www.lyrasis.org. Accessed December 06, 2023. See: https://solr.apache.org /. Accessed December 06, 2023. See: https://camel.apache.org/manual/routes.html. Accessed December 06, 2023. See: https://projectblacklight.org /. Accessed December 06, 2023. See: https://gin.g-node.org ; RRID:SCR_007279; re3data: https://doi.org/10.17616/R3SX9N; fairsharing.org: https://doi.org/10.25504/FAIRsharing.nv6mrg. All accessed December 06, 2023. See: INCF, https://incf.org. Accessed December 06, 2023. See: https://gogs.io. Accessed December 06, 2023. See: https://git-annex.branchable.com. Accessed December 06, 2023. See https://datalad.org. Accessed December 06, 2023. See: https://inveniosoftware.org/products/rdm /. Accessed December 06, 2023. See: https://www.zenodo.org. Accessed December 06, 2023. See: https://www.re3data.org/repository/r3d100014139 and https://lmuphys.rdm.lab.lrz.de /. Accessed December 06, 2023. See: https://www.globus.org/data-sharing. Accessed December 06, 2023. See: https://inveniosoftware.org/products/rdm /. Accessed December 06, 2023. See: https://schema.datacite.org /. Accessed December 06, 2023. Institutional repository: https://mediatum.ub.tum.de /; Software: https://github.com/mediatum/mediatum. Both accessed December 06, 2023; Leiß, Johann, Edwin Pretz, and Arne Seifert. "mediaTUM: Der zentrale Medienserver der Technischen Universität München." In: Informationsmanagement in Hochschulen. Edited by Arndt Bode and Rolf Borgeest. Berlin, Heidelberg: Springer-Verlag, 2010. 365–377. https://doi.org/10.1007/978-3-642-04720-6_29. See: https://www.it.tum.de/it/projekte/archiv/integratum /. Accessed December 06, 2023; and Bode, Arndt, and Borgeest, Rolf (ed.).Informationsmanagement in Hochschulen. Berlin, Heidelberg: Springer-Verlag 2010. https://doi.org/10.1007/978-3-642-04720-6. See: https://publikationen.badw.de /. Accessed December 06, 2023. List of sites made with Geist: https://gitlab.lrz.de/badw-it/geist#examples-for-sites-made-with-geist. Accessed December 06, 2023. Arnold, Eckhart, and Müller Stefan. „Wie permanent sind Permalinks?," in: Informationspraxis 3,1 (2017). https://doi.org/10.11588/ip.2016.2.33483. See: Geist: https://gitlab.lrz.de/badw-it/geist and https://gitlab.lrz.de/badw-it/system ; Stefan Müller (Lead Developer). E. g. a third-party metadata generator is integrated in a few of the mentioned repositories.

By Eckhart Arnold; Alexander Berg-Weiß; Yves Vincent Grossmann; Martin Grummann; Stephan Hachinger; Olga Klopova; Larissa Leiminger; Johannes Munke; Martin Spenger; Thomas Wachtler; Christine Wolter and Thomas Zastrow

Reported by Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author

Titel:	RDMUC: Various Approaches to Research Data Repositories in Munich.
Autor/in / Beteiligte Person:	Arnold, Eckhart ; Berg-Weiß, Alexander ; Grossmann, Yves Vincent ; Grummann, Martin ; Hachinger, Stephan ; Klopova, Olga ; Leiminger, Larissa ; Munke, Johannes ; Spenger, Martin ; Wachtler, Thomas ; Wolter, Christine ; Zastrow, Thomas
Link:	Volltext (PDF)
Zeitschrift:	ABI Technik, Jg. 44 (2024-02-01), Heft 1, S. 28-38
Veröffentlichung:	2024
Medientyp:	academicJournal
ISSN:	0720-6763 (print)
DOI:	10.1515/abitech-2024-0004
Schlagwort:	DATA management DATA libraries INSTITUTIONAL repositories COMPUTER software MUNICH (Germany) Subjects: DATA management DATA libraries INSTITUTIONAL repositories COMPUTER software repository software research data research data management Forschungsdaten Forschungsdatenmanagement Repositorien-Software Language of Keywords: English; German
Sonstiges:	Nachgewiesen in: DACH Information Sprachen: German Alternate Title: RDMUC: Unterschiedliche Ansätze für Forschungsdatenrepositorien in München. Language: German Document Type: Article Geographic Terms: MUNICH (Germany) Author Affiliations: 1 = Bayerische Akademie der Wissenschaften, Referat für Digital Humanities Forschung & Entwicklung München, Deutschland ; 2 = Ludwig-Maximilians Universität München, Universitätsbibliothek München, Deutschland ; 3 = Max Planck Digital Library München, Deutschland ; 4 = Technische Universität München, Universitätsbibliothek München, Deutschland ; 5 = Bayerische Akademie der Wissenschaften, Leibniz-Rechenzentrum, Garching bei München, Deutschland ; 6 = Ludwig-Maximilians-Universität München, Faculty of Biology Planegg-Martinsried, Deutschland ; 7 = Max Planck Computing and Data Facility, Datagroup, Garching bei München, Deutschland Full Text Word Count: 5144

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

RDMUC: Various Approaches to Research Data Repositories in Munich.