Global Information Lookup Global Information

Data lineage information


Data lineage includes the data origin, what happens to it, and where it moves over time.[1] Data lineage provides visibility and simplifies tracing errors back to the root cause in a data analytics process.[2]

It also enables replaying specific portions or inputs of the data flow for step-wise debugging or regenerating lost output. Database systems use such information, called data provenance, to address similar validation and debugging challenges.[3] Data provenance refers to records of the inputs, entities, systems, and processes that influence data of interest, providing a historical record of the data and its origins. The generated evidence supports forensic activities such as data-dependency analysis, error/compromise detection and recovery, auditing, and compliance analysis. "Lineage is a simple type of why provenance."[3]

Data lineage can be represented visually to discover the data flow/movement from its source to destination via various changes and hops on its way in the enterprise environment, how the data gets transformed along the way, how the representation and parameters change, and how the data splits or converges after each hop. A simple representation of the Data Lineage can be shown with dots and lines, where dot represents a data container for data points and lines connecting them represents the transformations the data point undergoes, between the data containers.

Representation broadly depends on the scope of the metadata management and reference point of interest. Data lineage provides sources of the data and intermediate data flow hops from the reference point with backward data lineage, leading to the final destination's data points and its intermediate data flows with forward data lineage. These views can be combined with end-to-end lineage for a reference point that provides a complete audit trail of that data point of interest from sources to their final destinations. As the data points or hops increase, the complexity of such representation becomes incomprehensible. Thus, the best feature of the data lineage view would be to be able to simplify the view by temporarily masking unwanted peripheral data points. Tools that have the masking feature enable scalability of the view and enhance analysis with the best user experience for both technical and business users. Data lineage also enables companies to trace sources of specific business data for the purposes of tracking errors, implementing changes in processes, and implementing system migrations to save significant amounts of time and resources, thereby tremendously improving BI efficiency.[4]

The scope of the data lineage determines the volume of metadata required to represent its data lineage. Usually, data governance, and data management determines the scope of the data lineage based on their regulations, enterprise data management strategy, data impact, reporting attributes, and critical data elements of the organization.

Data lineage provides the audit trail of the data points at the highest granular level, but presentation of the lineage may be done at various zoom levels to simplify the vast information, similar to analytic web maps. Data Lineage can be visualized at various levels based on the granularity of the view. At a very high level data lineage provides what systems the data interacts before it reaches destination. As the granularity increases it goes up to the data point level where it can provide the details of the data point and its historical behavior, attribute properties, and trends and data quality of the data passed through that specific data point in the data lineage.

Data governance plays a key role in metadata management for guidelines, strategies, policies, implementation. Data quality, and master data management helps in enriching the data lineage with more business value. Even though the final representation of data lineage is provided in one interface but the way the metadata is harvested and exposed to the data lineage graphical user interface could be entirely different. Thus, data lineage can be broadly divided into three categories based on the way metadata is harvested: data lineage involving software packages for structured data, programming languages, and big data.

Data lineage information includes technical metadata involving data transformations. Enriched data lineage information may include data quality test results, reference data values, data models, business vocabulary, data stewards, program management information, and enterprise information systems linked to the data points and transformations. Masking feature in the data lineage visualization allows the tools to incorporate all the enrichments that matter for the specific use case. To represent disparate systems into one common view, "metadata normalization" or standardization may be necessary.

  1. ^ "What is Data Lineage? - Definition from Techopedia".
  2. ^ Hoang, Natalie (2017-03-16). "Data Lineage Helps Drives Business Value - Trifacta". Trifacta. Retrieved 2017-09-20.
  3. ^ a b De, Soumyarupa. (2012). Newt : an architecture for lineage based replay and debugging in DISC systems. UC San Diego: b7355202. Retrieved from: https://escholarship.org/uc/item/3170p7zn
  4. ^ Drori, Amanon (2020-05-18). "What is Data Lineage? - Octopai". Octopai. Retrieved 2020-08-25.

and 21 Related for: Data lineage information

Request time (Page generated in 0.8425 seconds.)

Data lineage

Last Update:

Data lineage includes the data origin, what happens to it, and where it moves over time. Data lineage provides visibility and simplifies tracing errors...

Word Count : 6167

Lineage

Last Update:

(genetic) Lineage markers Data lineage Lineage (series), a medieval fantasy massively multiplayer online role-playing game franchise Lineage (video game)...

Word Count : 234

Data mapping

Last Update:

Identification of data relationships as part of data lineage analysis Discovery of hidden sensitive data such as the last four digits of a social security...

Word Count : 723

Big data

Last Update:

and developing information systems Data lineage – Origins and events of data Data philanthropy – Aspect of culture Data science – Interdisciplinary field...

Word Count : 16295

Data archaeology

Last Update:

related to data archaeology is data lineage. The first step in performing data archaeology is an investigation into their data lineage. Data lineage entails...

Word Count : 1427

DataOps

Last Update:

transparency, and comprehensive data lineage tracking through the entire workflow. Design process for growth and extensibility. The data flow model must be designed...

Word Count : 882

Embraer Lineage 1000

Last Update:

The Embraer Lineage 1000 is a variant of the Embraer 190 regional jet airliner, launched as a private jet on May 2, 2006. Manufactured by the Brazilian...

Word Count : 807

LineageOS

Last Update:

LineageOS is an Android-based operating system for smartphones, tablet computers, and set-top boxes, with mostly free and open-source software. It is...

Word Count : 4637

Lineage Logistics

Last Update:

Lineage Logistics is the world's largest temperature-controlled warehouse real estate investment trust (REIT), owned by Bay Grove, LLC. Entering international...

Word Count : 2332

Business intelligence

Last Update:

involve data mining, process mining, statistical analysis, predictive analytics, predictive modeling, business process modeling, data lineage, complex...

Word Count : 2454

Cell lineage

Last Update:

Cell lineage denotes the developmental history of a tissue or organ from the fertilized egg. This is based on the tracking of an organism's cellular ancestry...

Word Count : 1530

Data universalism

Last Update:

critical data studies explores the geologies and histories of data by investigating data assemblages and tracing data lineage which unfolds data histories...

Word Count : 1058

Phylogenetic Assignment of Named Global Outbreak Lineages

Last Update:

epidemiological information to designate new lineages; these can be designated by data producers, and lineage suggestions can be submitted to the Pango team...

Word Count : 1126

MicroG

Last Update:

pre-installed software. LineageOS for MicroG was created after LineageOS developers declined to integrate MicroG into LineageOS; the developers cited...

Word Count : 1404

Data integration

Last Update:

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant...

Word Count : 3745

Ancient East Eurasians

Last Update:

star-like expansion pattern (>45kya), and are linked to the "East Eurasian" lineage, broadly ancestral to modern populations in Eastern Eurasia, Oceania, and...

Word Count : 2088

Incomplete lineage sorting

Last Update:

Incomplete lineage sorting, also termed hemiplasy, deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism, describes a phenomenon...

Word Count : 2141

Living fossil

Last Update:

origin of the extant clade. Living fossils commonly are of species-poor lineages, but they need not be. While the body plan of a living fossil remains superficially...

Word Count : 4601

Allison Woodruff

Last Update:

1998 at the University of California, Berkeley, with the dissertation Data Lineage and Information Density in Database Visualization supervised by Michael...

Word Count : 161

Race and ethnicity in the United States census

Last Update:

reservations." This expanded version included the question "Fraction of person's lineage that is white." The 1910 census was similar to that of 1900, but it included...

Word Count : 6748

Kestrel

Last Update:

Gelasian age (Late Pliocene or Early Pleistocene, around 2.5–2 mya), the main lineage of true kestrels emerged; this contains the species characterized by a...

Word Count : 707

PDF Search Engine © AllGlobal.net