Not to be confused with Sanitization (classified information) or Data scrubbing.
Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.[1] Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting or a data quality firewall.
After cleansing, a data set should be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleaning differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at the time of entry, rather than on batches of data.
The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid postal code), or with fuzzy or approximate string matching (such as correcting records that partially match existing, known records). Some data cleansing solutions will clean data by cross-checking with a validated data set. A common data cleansing practice is data enhancement, where data is made more complete by adding related information. For example, appending addresses with any phone numbers related to that address. Data cleansing may also involve harmonization (or normalization) of data, which is the process of bringing together data of "varying file formats, naming conventions, and columns",[2] and transforming it into one cohesive data set; a simple example is the expansion of abbreviations ("st, rd, etc." to "street, road, etcetera").
^Wu, S. (2013), "A review on coarse warranty data and analysis" (PDF), Reliability Engineering and System, 114: 1–11, doi:10.1016/j.ress.2012.12.021
^"Data 101: What is Data Harmonization?". Datorama. 14 April 2017. Archived from the original on 24 October 2021. Retrieved 14 August 2019.
Datacleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database...
Look up cleansing in Wiktionary, the free dictionary. Cleansing may refer to: Ethnic cleansing, the systematic forced removal of ethnic or religious groups...
an operational data store and may require datacleansing for additional operations to ensure data quality before it is used in the data warehouse for reporting...
standards for data quality. In such cases, datacleansing, including standardization, may be required in order to ensure data quality. Defining data quality...
of data and information. Dirty data that is collected needs to be cleansed to maintain good data quality. Challenges that arise in datacleansing is that...
In computing, data validation or input validation is the process of ensuring data has undergone datacleansing to confirm they have data quality, that...
Metadata registry Data quality Data discovery DatacleansingData integrity Data enrichment Data quality assurance Secondary data In modern management...
known as datacleansing. In sociology, dirty data refer to secretive data the discovery of which is discrediting to those who kept the data secret. Following...
Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs...
re-sampling can also be thought of as datacleansing procedures. Finally, the data acquisition, normalization, and cleansing portion of SHM process should not...
mappings, and procedures. Datacleansing and transformation requirements are also gauged for data formats to improve data quality and to eliminate redundant...
Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions...
initial understanding of the data is had, the data can be pruned or refined by removing unusable parts of the data (datacleansing), correcting poorly formatted...
5D optical data storage (also branded as Superman memory crystal, a reference to the Kryptonian memory crystals from the Superman franchise) is an experimental...
Low-Power Double Data Rate (LPDDR), also known as LPDDR SDRAM, is a type of synchronous dynamic random-access memory that consumes less power and is targeted...
of drinking water and disposal of sewage Datacleansing, detecting and correcting corrupt or inaccurate data This disambiguation page lists articles associated...
overall objectives of the data Methods used to handle data editing DatacleansingData pre-processing Data wrangling Iterative proportional fitting Triangulation...
contents while powered on but when the power is interrupted, the stored data is quickly lost. Volatile memory has several uses including as primary storage...
Management. Master Data Management solutions[buzzword] include Data Extraction, DataCleansing and Data Enrichment; available via a cloud-based data platform....
of CK_t, CK_c. Each channel interface maintains a 128‑bit data bus operating at double data rate (DDR). HBM supports transfer rates of 1 GT/s per pin...
Optical storage refers to a class of data storage systems that use light to read or write data to an underlying optical media. Although a number of optical...
for general storage and transfer of data. NAND or NOR flash memory is also often used to store configuration data in digital products, a task previously...