DATA CURATION PRESERVATION ACTIVITIES: BUDGETS, COSTS, STAFFING AND SKILLS

When organisations commence data curation initiatives, they often focus on the exciting parts such as metadata tagging and building shiny new repositories. But long-term data preservation comes with a massive, often invisible tail of ongoing operational challenges. If you want your data to survive and remain useful five or ten years from now, then you will have to look past the technology and confront the two biggest icebergs in the room which are financial sustainability and human capital.

 The Financial Reality: Budgets vs Actual Costs

Many organisations treat data preservation as a capital expense such as a one-time purchase of servers or cloud storage. However, in reality, preservation is a long-term operational expense which requires huge financial muscle, (Marimuthu, 2021). When building a data curation budget, organisations frequently overlook  hidden expenses such as ingestion costs. These include issues such quality control, virus scanning and metadata enrichment before data even hits the archive. Another hidden cost is data egress fees. Cloud providers often charge hefty fees to retrieve your data or move it between systems, (Pasqui, 2024). Then we also have decommissioning. This is the complex, time consuming process of safely shutting down old software while extracting its data. In order to fix the issue of overlooking hidden costs organisations need to move away from short-term project grants or fluctuating annual IT budgets,(Bermes & Fauduet, 2011). Successful data preservation requires dedicated, ring-fenced operational funding models that account for data maintenance over its entire mandated lifecycle.

The Human Factor: Staffing and the Evolving Skills Gap

You can buy the best preservation software in the world, but it won’t run itself. Data curation is inherently a human discipline, and finding and keeping the right talent is a major hurdle. A top-tier data curator needs a highly specific, multidisciplinary skill set. You need individuals with skills such as Library and information science with knowledge in metadata standards found in Dublin Core or PROV-O, (Nkwe & Ngoepe, 2021). You also need to understand what the data actually means so you will need individuals with domain expertise. Then you will need people with technical literacy. People who are comfortable with cloud infrastructure, APIs, data manipulation languages such as (Python/R), and security compliance. 

Finding individuals who possess all three qualities is incredibly difficult, leading to a fierce talent war between academia, government institutions and private enterprises. Data preservation relies heavily on continuity and institutional memory, (Sinclair et al., 2011) . When a key data steward leaves an organisation, they often take undocumented knowledge about legacy data structures with them. High staff turnover can lead to "data orphanhood," where datasets exist in storage, but no one left at the organisation knows how to interpret, validate or use them.

How to Bridge the Gap: A Roadmap for the Future

Overcoming these budget and staffing bottlenecks requires a strategic shift in how organisations view data departments. Human curators shouldn't spend their days manually tagging files. Leverage AI-driven metadata extraction tools to handle bulk curation tasks, allowing your skilled staff to focus on high-value governance, ethics, and quality control,(Prom, 2011) . The organisations should build data literacy pipelines internally. They should stop searching for the perfect external candidate and instead upskill current IT staff in archival science, or train library/information professionals in cloud data management. Finally organisations should adopt appraisal strategies. They should establish strict data appraisal policies to determine what data holds genuine long-term value and what can be safely deleted. Less data to preserve means lower costs and less strain on your staff.

Conclusion 

Data curation isn’t a technical problem with a one-time software fix. Rather it is a continuous commitment. By acknowledging the real, ongoing costs of digital preservation and actively investing in multidisciplinary teams, organisations can ensure that their data remains an asset for the future, rather than a liability.


References 

Bermès, E., & Fauduet, L. (2011). The human face of digital preservation: organizational and staff challenges, and initiatives at the Bibliothèque nationale de France. International Journal of Digital Curation, 6(1), 226–237. https://doi.org/10.2218/ijdc.v6i1.184

Marimuthu,  F. (2021). Cost and management accounting fundamentals: a Southern African approach. Juta company limited . http://ebookcentral.proquest.com/lib

Nkwe, M & Ngoepe, M. (2021). Compliance with freedom of information legislation by public bodies in South Africa. Government Information Quarterly, 38(2), 101567. https://doi.org/10.1016/j.giq.2021.101567

Pasqui, V. (2024). Digital curation and long term digital preservation in libraries. JLIS.It 15(1) 109-125. https//doi.org/10.36253/jlis.it-567

Prom, C. J. (2011). Preserving digital objects in institutional contexts: Sustainability and integration issues. Library Trends, 59(4), 725–740.

Sinclair, P., et al. (2011). Digital preservation and organisational readiness: Challenges in implementation. International Journal of Digital Curation. (Commonly cited in digital curation readiness literature)

Comments

  1. It is indeed true that most organisations only consider Data curation when a need arises.Your post has enlighten me on the importance of treating Data curation as a long life project if organisations are to survive

    ReplyDelete
  2. A high-end system needs to be paired with a top-notch expert who can leverage on the technology's efficiency and effectiveness... well explained

    ReplyDelete

Post a Comment

Popular posts from this blog

Data collection and repositories

DATA CURATION PRESERVATION ISSUES (THREATS TO DIGITAL MATERIALS )

Data Storage