Metadata
What is it?
Metadata is information that describes data (i.e., data about data) to facilitate #findability and provide context. In addition, metadata provides ways to organize and manage data through common tags or descriptors (World Meteorological Organization). Metadata is the information that describes “the who, what, where, when, why, and how” of the dataset itself (National Centers for Environmental Information).
An easy way to think about metadata is that it can encompass the top row of a spreadsheet containing environmental data. For example, in a water quality data spreadsheet, each record would contain information with each measurement of water quality, including:
- Who: Information about the source of the data, such as what organization collected it
- What: Unit information, such as ppm (parts per million)
- When: Date, time, and time zone of a measurement, for example
- Where: Location information, such as a site ID, latitude and longitude coordinates, or well depth
- How: Information about the sensor that collected the measurement, including calibration or serial number
The power of metadata lies in its #documentation. This can be as simple as including a document alongside your data stored on a platform or website, or documentation within a spreadsheet itself. This should include the information above, plus other considerations about the data, such as specific identifiers, subjects, methodology, and data crosswalks. What you include in your documentation depends on the data itself, the metadata standards you use, and your goals with the data.
While you are in the early stages of data collection or creation, make a note of file names and formats, how the data is organized, how it was generated, and how it was processed or altered, as well as any explanation of codes or abbreviations used in the naming structure (MIT Libraries). Making a note of these specific codes and structures can support integration of metadata standards and broader data documentation.
Why is it important?
Metadata is important for environmental data because it makes it easier to identify important datasets that could support environmental research and decision making. In short, it makes the data more accessible. Metadata also helps improve #transparency around where the data came from, how it was collected, and how it can be used (John Horodyski, 2022). This makes it easier to reuse the data and replicate the methods of #collection, as well as helping others improve and build on the original research.
Mentioned and Additional Resources
- To better understand the importance of metadata and using FAIR practices to create robust metadata, see Without appropriate metadata, data-sharing mandates are pointless.
- To get a basic definition of metadata and how it applies to climate data, plus some basic guidelines for climate metadata, see Climate Metadata.