In December 1998, a mission Mars Climate Orbiter. was launched from the cosmodrome at Cape Canaveral in Florida. At that time, a probe worth $125 million was fired towards the Red Planet, the aim of which was to study the atmosphere and climate of Mars, as well as to search for water and monitor the movement of dust on its surface. The mission, which was to significantly deepen humanity's knowledge about the Solar System, ultimately ended in a spectacular failure. The probe, upon reaching its destination, instead of passing Mars at a distance of about 150 kilometers as planned, hit an altitude of about 56 kilometers and burned up in the atmosphere.
The NASA accident investigation commission reported that the cause was a computer error resulting from... discrepancies in units of measure. During the investigation, it turned out that engineers from the Jet Propulsion Laboratory (JPL) in California used the metric system, while employees of Lockheed Martin Astronautics in Denver, the company responsible for the design and construction of the probe, used the American system based on inches and pounds. Differences in the data confused the onboard computer controlling the mission and led to the disaster.
– It was so stupid – later commented for the Los Angeles Times John Logsdon, director of the Space Institute at the University of Washington. What is now called data stewardship, would probably have prevented this disaster.
A world full of data, or who is a data steward
As the ExplodingTopics portal reports, the world is growing by 328.77 million terabytes of data every day, which gives 120 zettabytes per year. The prefix "zetta" means a trillion bytes. One trillion is approximately the number of all grains of sand.
All these data are stored and processed by companies, organizations and other entities that try to extract as much useful information from them as possible. However, the main challenge now is organizing data and applying appropriate rules in managing them. This is what a data steward does.
– A data steward is a person who supports researchers in all matters related to data management and sharing – explains Magdalena Szuflita-Żurawska, Leader of the Open Science Competence Center at the Gdańsk University of Technology. – He is also responsible for implementing data usage and security policies within the framework of data management initiatives. The position of data steward appeared with the development of the information society and the explosion of data to manage. The task of such a specialist (or team) is to take care of corporate data in such a way that they are processable, consistent, accessible and secure – adds Szuflita-Żurawska.
Data stewards are therefore present (or should be) in all organizations processing data, including companies, public institutions and research centers.
What is the significance of a data steward's work
Key in the work of a data steward is enabling full and valuable access to owned data. However, simply sharing or publishing them is not enough.
– The data shared should have a range of attributes, which can be summarized by the English acronym FAIR, created from the words: Findable, Accessible, Interoperable, and Reusable. This means that the data should be collected, cataloged and shared in a way that allows (both technically and legally) their reuse – explains Magdalena Szuflita-Żurawska.
In the case of business organizations, the requirements are similar, except that the data itself is often used in real time and decisions are made based on it. Sometimes their effects can lead to actions comparable to burning the Mars Climate Orbiter probe in the atmosphere of the red planet within a specific organization.
Problem with percentages, problem with definitions
An example of poor data management at the data stewardship level are unclear or vague concepts. The sales department may define "conversion" as leaving contact data, while in the minds of sales specialists it will be making a purchase. In this way, both teams can cooperate with each other, describing various phenomena using similar words, which however over time can lead to serious misunderstandings - even when indicating the "channel that generates the most conversions".
Similar misunderstandings can occur within the marketing department itself. It is enough that part of the team bases its metrics on views, and part - on users. If both teams use percentage values, hidden errors distorting the result will be found in the delivered reports.
Creating a set of uniform definitions, standards or units, which are used to describe data in the organization, is one of the basic tasks of a data steward. Just like making sure that data users stick to the imposed rules and do not increase the mess in the organization's IT systems.
Good to start
The role of data stewards in organizations is not limited to supporting and maintaining existing data. Their help can also be useful in reducing something that can be called "technical debt" in the case of data management in the future.
– Such a plan is not only a necessary formality, but it helps to foresee possible difficulties in data management and to avoid them in advance – comments Magdalena Szuflita-Żurawska.
Such difficulties may be, for example, a form of data incompatible with the system in which most of the organization's information is stored. Another risk is the possibility of easily losing or leaking data. Imposing a specific form, framework and process for managing them allows to avoid dangers in advance.
– When verifying the plans sent to us, we often draw attention to issues that scientists have overlooked, such as whether they have the right to use previously published data for their own research, at what stage the data will be anonymized, who will have access to them and how this will be enforced – explains the expert.
Often, identifying a problem at an advanced stage of the project no longer allows for the repair of damage associated with the irreversible loss of information after conversion to other formats or the deletion of original data.
The path of the steward
At the moment, there is a lack of both a precise definition of a data steward and detailed guidelines on what criteria they should meet. Depending on the organization in which they work, both their education and previous experience can vary greatly.
In the case of companies, the role of a data steward may be similar to that of an analyst or IT specialist in the field of database processing and their care. However, the portal indeed.com admits that due to the diversity, it is difficult to collect specific data, which in the context of this role is quite ironic.
– There is no one definition of a data steward. Everything depends on the specific conditions and procedures in a particular unit, as well as on what stages of the research process and to what extent the support of data stewards is necessary. They can be both people working in the administrative department, librarians, and scientists. In the case of Gdańsk University of Technology, the Competence Center was established at the library, which was closely related to the development of the institutional repository and its expansion with a module related to research data, i.e. MOST Data – emphasizes Magdalena Szuflita-Żurawska.
She also adds that regardless of the initial profession, knowledge in the field of IT related to metadata schemas or protocols used in their transmission is extremely valuable. Knowledge of legal issues regarding licenses granted when sharing data in the repository can also be important.
According to the expert, with the development of data processing capabilities, the demand for data stewards is increasing in both scientific and business organizations. As a result, more and more initiatives are being created, the aim of which is training and promoting good standards.
– World organizations associated with open research data regularly conduct training for future data stewards. One of the most important organizations that promotes the idea of data stewardship is GO FAIR – says Magdalena Szuflita-Żurawska.
She adds that in the case of science, this is also due to the fact that easier data processing and the possibility of their reuse changes the image of not only business data, but also scientific ones.
– In the assessment of the significance of scientific research results, the center of gravity is shifting from publication in renowned scientific journals to sharing the research data that underlies them. This is reflected in the requirements set by scientific units, publishers, and above all by foreign and domestic funding institutions, including the National Science Center – summarizes.
Business data has become the subject of interest for entire organizations, and the idea of being data driven is currently the basic business paradigm. However, without the hard work of a data steward, it will not be achievable in any organization. A simple mistake of confusing an inch with a centimeter can stand in the way.