Skip to content
ConceptReviewed

Data

Name variants

English
Data
Katakana
データ

Quality / Updated / COI

Quality
Reviewed
Updated
COI
none

TL;DR

Data are recorded facts about events or entities that can be analyzed to create information and decisions.

Definition

Data consist of raw observations such as numbers, text, images, or signals collected from processes and systems. By themselves, data may lack meaning until they are organized, cleaned, and interpreted in context. Defining what data to collect, how it is measured, and how it is governed is a strategic decision that affects quality, privacy, and insight.

Decision impact

  • It determines what to collect and how to measure it to answer real questions.
  • It influences governance rules such as ownership, access, and retention.
  • It affects downstream analytics quality by shaping accuracy and completeness.

Key takeaways

  • Define data with clear units, sources, and collection rules.
  • Separate raw data from derived metrics to avoid confusion.
  • Prioritize quality and relevance over sheer volume.
  • Document metadata so others can interpret the data correctly.
  • Respect privacy and ethical constraints at the point of collection.

Misconceptions

  • Data are not automatically objective; collection choices add bias.
  • More data does not guarantee better decisions without context.
  • Data are not the same as information or insight; processing is required.

Worked example

A retailer wants to understand repeat purchases. The team defines a customer identifier, captures transaction timestamps, and logs product categories. They add metadata for time zones and data sources to avoid misinterpretation. With clean data, analysts can calculate repeat rate and segment behavior, while privacy rules control access to personal identifiers.

Citations & Trust

  • Principles of Data Science 1.1 What Is Data Science? (OpenStax)