Data transformation is the process of changing the data in some way. More formally, a transformation involves creating a new variable or set of variables from an existing variable or set of variables.
Objectives of transformation
Data transformation is undertaken with the following objectives:
- Making it easier to see patterns in the data (e.g., the Log transformations and Principal Components Analysis).
- Making it easier to communicate patterns in the data (e.g., the Net Promoter Score).
- To address violations of the assumptions of statistical tests (e.g., Ranks, Log transformations).
- To improve the validity of regression models (e.g., Basis Functions).
- To reduce the amount of data (e.g., Principal Components Analysis).
Standard transformations of a categorical variable
A categorical variable can be transformed in one of two ways:
- It can be turned into a numeric variable, by coming up with some rules about the numeric interpretation of categories. For example:
- The categories of a categorical variable can be combined. Most commonly, small categories are merged into larger categories. For example:
Standard transformations of numeric variables
- Log transformations
- Principal Components Analysis
- Cluster Analysis
- Basis functions, such as:
- Dummy Variables
- Orthogonal polynomials