Data should be managed to:
The data life cycle provides a high level overview of the stages involved in successful management and preservation of data for use and reuse.
A data management plan describes how you will manage your data during the lifetime of a research project. The process of creating your DMP will force you to think about potential issues related to the project's data that could affect timeline, costs and personnel needed.
Goal: Answer these 4 questions:
from Recknagel and Michener. "Ecological Informatics", 2017
Accidents happen !!!
Document and preserve your data when you are actively analyzing them!
You would not have to remember:
=> Easier to share with others, good for collaborations!
We mainly have being talking about data, but these rules apply to all the scientific processes and products generated by a research project, including:
=> Easier to conduct your Analysis and even so for others!!
Data are heterogeneous in:
Spreadsheets are (still) the primary data entry tool of the digital age!
=> A Spreadsheet is not a table !!
Table = Relation = Data set (~ Worksheet)
Column = Variable = Attribute = Characteristic
Row = Record = Tuple <> Observation
Keys are used to Join or Merge
Cell = Value = Measurement
Data Model = Schema
Observations about different entities combined
Observations. A better way to model data is to organize the observations about each type of entity in its own table. This results in:
This is normalized data (aka tidy data)
Variables. In addition, for normalized data, we expect the variables to be organized such that:
When one has normalized data, we often use unique identifiers to reference particular observations, which allows us to link across tables. Two types of identifiers are common within relational data:
An Entity-Relationship model allows us to compactly draw the structure of the tables in a relational database, including the primary and foreign keys in the tables.
In the above model, one can see that each site in the SITES
table must have one or more observations in the PLOTOBS
table, whereas each PLOTOBS
has one and only one SITE
.