top of page

Data Transformation

  • Writer: Daniel Amartya
    Daniel Amartya
  • Feb 29, 2020
  • 1 min read

If data set is skewed and or contains outliers, the mean tends to move away from the centre of the distribution, and so will the standard deviation. The standard deviation is the 'average' spread around the mean.


When mean and standard deviation does not represent the 'average' of a data set, parametric hypothesis tests cannot be used (t-test and such). Since these tests use mean and standard deviation.


This is when data transformation is useful. We can transform the data in some way so that the distribution becomes approximately symmetric and hopefully also close to normal.


Tips:

  1. It is always better to work down the ladder. This means that we should start with transformations that eliminate mild positive/negative skewness before advancing to transformations that eliminate moderate positive/negative skewness

  2. Naming the variable accordingly with the transformation. This will help you remember which transformation the numbers represent. This is especially helpful when working with numerous transformed data.


Table 1: Ladder of transformations: Commonly used transformations for reducing skewness in data distributions





Comments


bottom of page