The Elements of Data Sharing (2024)

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

The Elements of Data Sharing (1)

Link to Publisher's site

Genomics Proteomics Bioinformatics. 2020 Feb; 18(1): 1–4.

Published online 2020 Apr 28. doi:10.1016/j.gpb.2020.04.001

PMCID: PMC7187841

PMID: 32360769

Zhang Zhang,1,2,3,4, Shuhui Song,1,2,3,4 Jun Yu,1,3,4 Wenming Zhao,1,2,3,4 Jingfa Xiao,1,2,3,4 and Yiming Bao1,2,3,4

Author information Article notes Copyright and License information PMC Disclaimer

Data and their tailored characteristics are inheritable and long-lived, surpassing their analyzed results and conclusions regardless if they are produced by their generators or users. Aside from designing experiments for the new acquisition, scientific researchers always begin with a thorough synthesis of the existing data, especially those that have been demonstrated authentic and timely. This fact has to be particularly emphasized more than ever, as all aspects of our daily life and its measurable activities, for better and worse, are being generated and recorded to be part of the collection—known as the BIG DATA.

Sharing data is vital for a community of shared future

Sharing data begins with building a willful and dedicated community who consents a shared future at a global scale. On the one hand, public emergencies, such as epidemics and pandemics caused by many emerging infectious diseases, especially the two-in-a-row coronaviruses, severe acute respiratory syndrome coronavirus (SARS-CoV) and SARS-CoV-2 [1], often necessitate data sharing to aid expedited translation of big data into knowledge and procedures to improve human health. On the other hand, we are now being, and increasingly so, armed and empowered by many data-generating engines and tools, including high-throughput sequencing technologies and high-performance computing platforms, as well as their collaborative products—large-scale genomic big data that are generated at exponentially growing rates; most of the data are being continuously produced, often supported by public funding [2], [3]. Clearly, data sharing becomes pivotal for many considerations and plans for action in public emergencies, since the outcomes from data-sharing are of essence in yielding a complete picture of emergency situation, accelerating scientific research and knowledge discovery, and promoting sensible and expeditious decision-making as well.

Unfortunately, existing practices surrounding data sharing are not effective in achieving maximum interests from our investments. Data sharing is hindered or slowed down by a lack of clear identification of supporting elements for its implementation. What constitutes ‘the elements of data sharing’ is, however, largely undefined. Therefore, clarifying and defining data-sharing elements would be of fundamental significance. Especially, when the world faces unprecedented global threats and encounters public emergency situations (e.g., SARS-CoV-2 has spread around more than 200 countries/regions with 2,213,653 infected cases and 154,462 deaths as of 18 April 2020), we, as a community of shared future, need to specify vital elements of data sharing and establish rapid, open, and effective data release norms.

Data sharing demands a data ecosystem

Making data shared for the public involves a series of activities that span the entire life cycle of data flow and that embody all relevant parties in terms of policies for data sharing and release (particularly for data from public-funded research), standards for data description and exchange, as well as databases for data management and access. All these relevant entities and processes together form a data-sharing ecosystem, in which data sharing is initiated by data providers and implemented in databases that play important roles in data management and provide data access for the public. Therefore, elements of data sharing should cover two major camps, one for data providers (including not only raw data generators, but also databases that provide data annotations and relationships [4]) and the other for data managers.

Promptness, openness, and usefulness are of essence for data providers

For data providers, there are three key elements—promptness, openness, and usefulness (POU) —that serve as foundation guidelines for data sharing, particularly under public emergencies and critical situations (Figure 1). Promptness is crucially important during outbreaks since “speed is everything[5]. It is consistent well with the Bermuda Principles, advocating rapid public release of genome sequence data within 24 h after generation and without restrictions on use proposed by the International Human Genome Sequencing Consortium in 1996. Given the unexpected emergency circ*mstances, sharing data in a timely manner is beneficial immediately for worldwide researchers and long-term for the global human society. Certainly, in this particular case, publication rights reserved for data providers is the major concern. In order to make both parties happy, policies for prompt data sharing as common practice and emergency routine are to be established, accepted, and monitored by the society, where detailed considerations and facts, such as criteria for intellectual property reservation, priority for publication, and credit for data providers [6], all must be thoroughly announced and debated in professional and public settings.

The Elements of Data Sharing (2)

The elements of data sharing

The elements of data sharing involve promptness, openness, and usefulness for data providers, as well as deposition, integration, and translation for data managers. In full support of data-sharing activities, policies, databases, and standards should be established and acknowledged by the whole scientific community.

Openness emphasizes that both data themselves and the corresponding metadata should be released, publicized, and readily accessible in user-friendly databases. “Nothing great is ever accomplished in isolation”. Databases are not only responsible for data storage and processing, but also provide free internet access to all digital data. Currently, there have been several large global centers [7] in life sciences dedicated to molecular data (such as DNA/protein sequences and structures) collection and management, including the US National Center for Biotechnology Information (NCBI) [8], the European Bioinformatics Institute (EBI) [9], and the China National Center for Bioinformation/National Genomics Data Center (CNCB/NGDC) [10]. These publicly-supported centers accept data submissions globally and provide data-sharing services worldwide. It has to be emphasized that in order to keep data always accessible and long-lived, databases should be funded in a long-term and sustainable manner.

Last but not the least, the element of usefulness highlights the importance of data quality and completeness [11]. Data sharing is not a goal in itself but rather an effort to make data widely utilized. Accordingly, data to be shared must be reliable and complete, as biases/errors are characteristic of those in poor-quality or defective. Moreover, data in their full spectrum are definitely preferred, including all useful digital assets that contain, but not limited to, metadata, unprocessed data, derived datasets, analyzed results, source codes, protocols, flowcharts, etc. As a consequence, a collection of standards is certainly needed to be formulated by the user–provider community, and it can be envisaged that the more the community involvement is, the more successful the data-sharing efforts will become.

Deposition, integration, and translation are of essence for data managers

In practice, data sharing in itself is only a single frame of its entire life cycle. In order to promote activities of data sharing, to provide easy access to all shared data, and to achieve full benefits from sharable data, databases must act as hub through providing a suite of web services for digital data deposition, integration, and translation (DIT) that are foundational elements for data management (Figure 1). After data submission, curation is conducted to certify the shared data with high quality and with the capability of reusability. Therefore, data curation involves a wide range of critical processes with standardized annotation, quality filtering, and value-added representation with controlled vocabularies. Only curated data can be used for further integration with the aim of information mining and synthesis processing. Consequently, translation of big data into knowledge discovery would be achieved, in company with various outreach activities for knowledge dissemination and application. After all, databases provide a core instrument for data management and coordinate the data-sharing ecosystem, orchestrating all important elements relevant to curation, synthesis, and outreach (Figure 1).

The POU–DIT Elements of data sharing are interrelated and can be used in any combination and evolve incrementally in response to the evolution of data ecosystems. They are applicable to a wide range of research fields, covering common aspects of data sharing in terms of timeliness, publicity, and content in POU, as well as data, information, and knowledge in DIT. Moreover, the POU–DIT Elements describing common conduct codes of data-sharing and guiding rules of data management are complementary to the FAIR Principles [12] (that define the characteristics of data, namely, Findable, Accessible, Interoperable, and Reusable). Obviously, they share common goals to promote data openness and reusability for the scientific community. Despite challenges in harmonizing with data ownership, security, privacy, and data-protection laws [2] (the European Union’s General Data Protection Regulation, the US Health Insurance Portability and Accountability Act, etc.), all important and complex issues would be best clarified via open discussions [13].

Collaboration promotes data sharing

As mentioned above, challenges always come ahead of data sharing. For instance, diversity among data processing and sharing culture in a broadly-defined community, such as biomedicine—say genomics-meets-pandemics, often casts real obstacles. Ideally, funding agencies, journals, governmental organizations, as well as hands-on researchers, must work collaboratively and come up with common-practice protocols for data-sharing activities. Currently, a valuable effort is the Global Microbial Identifier (https://www.globalmicrobialidentifier.org) that aims to build a genomic epidemiological database for global identification of microorganisms in order to detect outbreaks and emerging pathogens. Ongoing efforts for the current outbreak caused by SARS-CoV-2 primarily include GISAID [14], GenBank [15] in NCBI, and the 2019 novel Coronavirus Resource [16] (2019nCoVR; https://bigd.big.ac.cn/ncov/) in CNCB/NGDC. Among them, 2019nCoVR features comprehensive integration and value-added curation, yielding large-quantity genome sequences with high-quality annotations (Figure 2) and providing a suite of services for viral genome data deposition, mining, and translation in real time. However, the need for data exchange and coordination between different databases, linking genomic data with important metadata, and data standardization across countries and laboratories, becomes very urgent and critical. To deal with global outbreaks as the COVID-19 pandemic, large and effective collaborations across different database resources (e.g., 2019nCoVR, GISAID, and GenBank), disciplines, and countries towards data sharing are of immediate necessity.

The Elements of Data Sharing (3)

Data-sharing scenarios in public emergency

Data planet welcomes data sharing

Collectively, data sharing is vital for translating data to knowledge, particularly when everyone in the world faces the same threat. To maximize benefits of data sharing for everyone, the POU–DIT Elements must establish logistics and standards for data sharing, provide guidance for all users that include, but not limited to, scientific researchers, policy makers, funding agencies, and journal publishers, and carry out all data-sharing activities. Some of the data and related infrastructures built in the processes, aside from the immediate utilization, may form historic memoirs and monuments for both heroes and victims of the event. Nevertheless, we need to embrace a data-sharing culture under both ordinary and extraordinary situations [17]. With shared future, we call upon our professional colleagues to hold our hands together and collaborate full-heartedly to build a better data planet, where data produced by the global community are shared with the POU–DIT Elements.

Competing interests

The authors declare no competing interests.

Acknowledgments

We thank our colleagues and students for their hard working on the 2019nCoVR (https://bigd.big.ac.cn/ncov) which inspired the idea of this article. This work was supported by grants from the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant Nos. XDA19090116 and XDA19050302), National Key R&D Program of China (Grant No. 2017YFC0907502), 13th Five-year Informatization Plan of the Chinese Academy of Sciences (Grant No. XXH13505-05), Wong KC Education Foundation to ZZ, and the International Partnership Program of the Chinese Academy of Sciences (Grant No. 153F11KYSB20160008).

Notes

Handled by Weimin Zhu

Footnotes

Peer review under responsibility of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China.

References

1. Yang X., Yu Y., Xu J., Shu H., Xia J., Liu H. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir Med. 2020;8:475–481. [PMC free article] [PubMed] [Google Scholar]

2. Phillips M., Molnar-Gabor F., Korbel J.O., Thorogood A., Joly Y., Chalmers D. Genomics: data sharing needs an international code of conduct. Nature. 2020;578:31–33. [PubMed] [Google Scholar]

3. The importance and challenges of data sharing. Nat Nanotechnol 2020;15:83. [PubMed]

4. Gaudet P., Bairoch A., Field D., Sansone S.A., Taylor C., Attwood T.K. Towards BioDBcore: a community-defined information specification for biological databases. Nucleic Acids Res. 2011;39:D7–10. [PMC free article] [PubMed] [Google Scholar]

5. Yozwiak N.L., Schaffner S.F., Sabeti P.C. Data sharing: make outbreak research open access. Nature. 2015;518:477–479. [PubMed] [Google Scholar]

6. Wu C.I., Poo M.M. Very fast evolution, not-so-fast publication – A proposed solution. Natl Sci Rev. 2020;7:237–238. [Google Scholar]

7. Rigden D.J., Fernandez X.M. The 27th annual Nucleic Acids Research database issue and molecular biology database collection. Nucleic Acids Res. 2020;48:D1–D8. [PMC free article] [PubMed] [Google Scholar]

8. Sayers E.W., Beck J., Brister J.R., Bolton E.E., Canese K., Comeau D.C. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2020;48:D9–D16. [PMC free article] [PubMed] [Google Scholar]

9. Cook C.E., Stroe O., Cochrane G., Birney E., Apweiler R. The European Bioinformatics Institute in 2020: building a global infrastructure of interconnected data resources for the life sciences. Nucleic Acids Res. 2020;48:D17–D23. [PMC free article] [PubMed] [Google Scholar]

10. National Genomics Data Center Members and Partners Database resources of the National Genomics Data Center in 2020. Nucleic Acids Res. 2020;48:D24–D33. [PMC free article] [PubMed] [Google Scholar]

11. Li Y., Sperrin M., Martin G.P., Ashcroft D.M., van Staa T.P. Examining the impact of data quality and completeness of electronic health records on predictions of patients' risks of cardiovascular disease. Int J Med Inform. 2020;133:104033. [PubMed] [Google Scholar]

12. Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. [PMC free article] [PubMed] [Google Scholar]

13. Drazen J.M., Morrissey S., Malina D., Hamel M.B., Campion E.W. The importance – and the complexities – of data sharing. N Engl J Med. 2016;375:1182–1183. [PubMed] [Google Scholar]

14. Shu Y., McCauley J. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro Surveill. 2017;22:30494. [PMC free article] [PubMed] [Google Scholar]

15. Sayers E.W., Cavanaugh M., Clark K., Ostell J., Pruitt K.D., Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2020;48:D84–D86. [PMC free article] [PubMed] [Google Scholar]

16. Zhao W.M., Song S.H., Chen M.L., Zou D., Ma L.N., Ma Y.K. The 2019 novel coronavirus resource. Hereditas (Beijing) 2020;42:212–221. (in Chinese with an English abstract) [PubMed] [Google Scholar]

17. Chretien J.P., Rivers C.M., Johansson M.A. Make data sharing routine to prepare for public health emergencies. PLoS Med. 2016;13:e1002109. [PMC free article] [PubMed] [Google Scholar]

Articles from Genomics, Proteomics & Bioinformatics are provided here courtesy of Elsevier

The Elements of Data Sharing (2024)

FAQs

What are the 4 elements of data? ›

Four Elements of Data: Volume, velocity, variety, and veracity
  • Volume is how much data you are actually managing.
  • Velocity is how fast that data is being created or being changed.
  • Variety is how much different data is being collected.
  • Veracity is how “clean” the data is.

What are the 7 golden rules of data sharing? ›

Necessary, proportionate, relevant, adequate, accurate, timely and secure: Ensure that information you share is necessary for the purpose for which you Page 2 are sharing it, is shared only with those individuals who need to have it, is accurate and up-to-date, is shared in a timely fashion, and is shared securely (see ...

What are the components of a data sharing agreement? ›

Specify the level of data to be shared (event, individual, or summary level) • Specify whether personal identifiers will be included • Specify the data elements or categories of data • Specify the time range of the data • Specify the format(s) of the data (CSV, Excel, SAS, etc.) How will data be transferred?

What are the three types of data sharing? ›

Forms of data sharing
  • Data commons: Resources are held in common, accessible to all members of a group. ...
  • Data collaboratives: Private data which benefits society and the environment is shared for social good. ...
  • Data marketplaces: Intermediary platforms or online stores through which data can be bought or sold.

What are the 4 main types of data? ›

4 Types Of Data- Nominal, Ordinal, Discrete And Continuous.

What are the key data elements? ›

The Definition of Key Data Element (KDE)

In the context of the Food and Beverage (F&B) industry, a key data element (KDE) refers to a critical piece of information or a data point that holds significant importance in the operational and decision-making processes within F&B establishments.

What are data sharing principles? ›

Data Sharing Principles. The data sharing principles are the risk management framework that sits at the core of the Scheme to support data custodians in deciding if it is safe to share data. The principles cover the data sharing project, people, setting, data and output.

What are the 7 data principles? ›

Lawfulness, fairness, and transparency; ▪ Purpose limitation; ▪ Data minimisation; ▪ Accuracy; ▪ Storage limitation; ▪ Integrity and confidentiality; and ▪ Accountability. These principles are found right at the outset of the GDPR, and inform and permeate all other provisions of that legislation.

What are the 5 rules of working with data? ›

The five data privacy rules
  • Consent. Before disclosing any data, check if the proper consent is in place to do so. ...
  • Purpose. Before collecting any data from an individual, make sure you need it. ...
  • Security and access. ...
  • Disclosure and accountability. ...
  • Destruction and disposal.
Aug 26, 2019

What is data sharing process? ›

Data sharing is the process of making the same data resources available to multiple applications, users, or organizations.

What is a data sharing protocol? ›

A Data Sharing Protocol is a formal agreement between organisations that are sharing personal data. For the purpose of this protocol, the terms 'data' and 'information' are synonymous.

What are the obligations of data sharing? ›

These obligations include: Obtaining consent from individuals before collecting, using, or sharing their personal data. Implementing appropriate security measures to protect personal data. Deleting personal data when it is no longer needed.

What are the goals of data sharing? ›

Data sharing thus makes it easier for everyone to understand and trust the data they're using. Data sharing thus means better data quality, enhanced decision-making, and improved context. All this can lead to smarter decision-making and greater alignment between different departments.

What is an example of data sharing? ›

Data sharing can be done routinely (for example the provider of an educational app routinely sharing data with the child's school) or in response to a one-off or emergency situation (for example sharing a child's personal data with the police for safeguarding reasons).

What are the functions of data sharing? ›

Data sharing eliminates data silos, resulting in greater efficiency and transparency and increased collaboration within an organization, as well as with partners. Data sharing also provides organizations with new and faster time to insights that help improve performance.

What are the 4 aspects of data? ›

IBM data scientists break it into four dimensions: volume, variety, velocity and veracity.

What are the 4 basic data types? ›

There are four basic data types in C programming, namely Char, Int, Float, and Double. What do signed and unsigned signify in C programming? In the C programming language, the signed modifier represents both positive and negative values while the unsigned modifier means all positive values.

What are the 4 components of a data table? ›

A data table contains a header row at the top that lists column names, followed by rows for data.
  • Table content.
  • Column headers.
  • Text alignment.

What are the four pieces of data? ›

4 Types of Data: Nominal, Ordinal, Discrete, Continuous | upGrad blog.

Top Articles
Compare Current Mortgage Rates in June 2024
Here's How to Live with Bare Floors
Maxtrack Live
Pollen Count Los Altos
Public Opinion Obituaries Chambersburg Pa
Printable Whoville Houses Clipart
Summit County Juvenile Court
Steamy Afternoon With Handsome Fernando
David Packouz Girlfriend
Pwc Transparency Report
Jasmine Put A Ring On It Age
Methodist Laborworkx
Trini Sandwich Crossword Clue
Enderal:Ausrüstung – Sureai
Marion County Wv Tax Maps
What Happened To Maxwell Laughlin
Nene25 Sports
Craigslist Farm And Garden Tallahassee Florida
Theresa Alone Gofundme
Craigslist Maui Garage Sale
Uconn Health Outlook
Long Island Jobs Craigslist
Beverage Lyons Funeral Home Obituaries
Ups Print Store Near Me
Optum Urgent Care - Nutley Photos
Craigslist Org Appleton Wi
Betaalbaar naar The Big Apple: 9 x tips voor New York City
At&T Outage Today 2022 Map
If you have a Keurig, then try these hot cocoa options
Scheuren maar: Ford Sierra Cosworth naar de veiling
Blackboard Login Pjc
2015 Kia Soul Serpentine Belt Diagram
Frank Vascellaro
Yu-Gi-Oh Card Database
Rays Salary Cap
Christmas Days Away
Manuel Pihakis Obituary
Wbli Playlist
Insideaveritt/Myportal
2 Pm Cdt
Www Usps Com Passport Scheduler
RECAP: Resilient Football rallies to claim rollercoaster 24-21 victory over Clarion - Shippensburg University Athletics
Dragon Ball Super Super Hero 123Movies
Amateur Lesbian Spanking
Take Me To The Closest Ups
Sc Pick 3 Past 30 Days Midday
Pronósticos Gulfstream Park Nicoletti
Divisadero Florist
Glowforge Forum
Where To Find Mega Ring In Pokemon Radical Red
Naughty Natt Farting
Latest Posts
Article information

Author: Dean Jakubowski Ret

Last Updated:

Views: 6235

Rating: 5 / 5 (70 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Dean Jakubowski Ret

Birthday: 1996-05-10

Address: Apt. 425 4346 Santiago Islands, Shariside, AK 38830-1874

Phone: +96313309894162

Job: Legacy Sales Designer

Hobby: Baseball, Wood carving, Candle making, Jigsaw puzzles, Lacemaking, Parkour, Drawing

Introduction: My name is Dean Jakubowski Ret, I am a enthusiastic, friendly, homely, handsome, zealous, brainy, elegant person who loves writing and wants to share my knowledge and understanding with you.