Fill Missing Values

For more information, see the online learning platform

Fill missing values creates a variable or set of variables filling the missing values with a value designated by the strategy adopted. Missing values can be replaced by a default value, an average value, previous or next values and interpolation. It can be used for numeric and symbolic variables.

Fill missing values

If there were missing values (∅) recorded in the Lake, these missing values are maintained when the data is resampled. Therefore, you may choose a strategy to “Fill missing values”. 

To create a strategy of fill missing value of a certain variable:

  1. Click Transform > Fill missing values in the menu.
  2. Enter a name for the new variable set.  

    Variable set

    Note that Filling Missing Values creates a Variable set, containing several new variables. Therefore, to check your Filled Missing variables you can go to the third icon on the left bar.

  3. Enter the variable name, default prefix: CLEAN_
  4. Choose the Method.
  5. Enter a Value, in case, Default Type enter a Default Value.
  6. Select Record set.
  7. Enter the variables or the set of variables.
  8. Enter a Index variable to define the order of rows (records). e.g. Select timestamp if the rows are not in chronological order. The ordering is important for some fill methods: i.e. interpolation, previous and next.  
  9. Click Save.

The types used:

  • Averagethe arithmetic mean of the original variable. 
  • Default value: the value designated in the Default value field. 
  • Interpolation: the linear interpolation between the value of the previous record and the value of the next record.  
  • Previous: the value of the previous record.
  • Next: the value of the next record.  

How to choose method

The method depends on the type of data to fill and the logic for choosing a sampling method is similar to resampling data. 

  • If you have some missing continuous numerical data, you may choose “interpolate”. 
  • You can choose, “default value”, to replace all missing values by a given value. For example, you may choose to fill with 0, if you know that all missing values are when the line is stopped and therefore the value should be 0.  

You can also filter missing values using “Record Sets”, so that if a value is missing for one variable, the whole record line will be removed from the selection (=from the record set)

Be aware that such a rule:  will remove all rows where "Profit/hr" is below 3000, including any rows where “Profit/hr” is missing and that this impacts all variables (Record sets filter entire rows).