Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

English us

Once the data is loaded from the Lake into your Analytics project, you can select the sampled data with Variable Sets and Record Sets:

  • Variable Sets: selection of variables (column selection). Once your Variable Set is created, you can use this as a filter each time you must select some variables.
    For example, you can create a Variable Set "temperature" with all your temperature variables, so that the temperatures are easier to select when needed.
  • Record Sets: selection of rows (timestamps or batches) based on some rules (row selection). For example, you can filter the data before a given time or when the production rate is above a given value. 

What is a Record?

A record is simply an indexed numerical value that identifies a specific instance, data point, in a database. The identifiers are established based on the row index of the table.

Record Sets
Anchor
Record Sets
Record Sets

After a file is uploaded into the DATAmaestro environment, you can select operators to define the rules for the record sets you want to create.

Info
titleRecord set

A record set does not delete or change the data, simply filters out specific rows for graphs or models.

To create a record set:

  1. Click Select > Record set in the menu.
  2. Change the data source, if required.
  3. Enter a name for the new record set, for example: Clean Data Set.
  4. Click Add to open the rules definition.
  5. Select an operator, see table below, and complete the rule, see Record Set Rules.
  6. Add more rules, as required.
  7. Click Compute.
Tip
titleOrder of Rules

You cannot reorder the rules applied to the set of records. Try to plan the order before you create a series of rules. To remove a specific rule, click Clear, or Clear all to remove them all.

Edit a Record Set

To access the Record Set editor:

  1. Click Record Sets on the sidebar to view a list of the saved record sets.
  2. Click the edit icon () for the record set you want to edit.


Tip
titleRecord set tool tip

A record set is a set of data points, specific instances, records or rows in a database. Record sets can be created based on a series of rules (First, Last, Random, Intersect, Filter, etc) or via rulers on all visualization graphs.

Record Set Rules

Name of OperatorWhat to EnterHow It Is Used
FirstNumberIndicates records selected from the front of the current record set. For example, “First 100” will select the first 100 records (or rows) within the selected data set (or record set if combining record set rules).
LastNumberIndicates records selected from the end of the current record set. For example, “last 100” will select the last 100 records (or rows) within the selected data set (or record set if combining record set rules).
RandomNumberIndicates records selected randomly. For example, “random 100” will randomly select 100 records (or rows) within the selected data set (or record set if combining record set rules).
Subseq NumbersRecord set rules that span from row n to row m within the selected data set (or record set if combining record set rules).
Not-in Record set Record set rule that excludes all data contained within a specified record set.
Union Record set Record set rule that allows the combination of an existing record set with additional rules (or with additional record sets). When combining two (or more) record sets using union, this is equivalent to keeping data points that are in either record set 1 OR record set 2.
Intersect Record set Record set rule that allows the combination of an existing record set with additional rules (or with additional record sets). When combining two (or more) record sets using intersect, this is equivalent to keeping data points that are in both record set 1 AND record set 2.
FilterVariable, control and numberMethod for creating record sets based on filtering a particular variable (numerical or symbolic) based on the given filter rules (less than, greater than, etc.).
Filter missingVariable set 

Create rules that removes records (rows), for a given variable set, that have at least one missing value for the variables. NB: Data sets with high proportions of missing data may result in empty record sets. 

Cyclic Number

Method for creating record sets that Keep or Skip rows of a dataset. This can be used to create learning and testing sets systematically, not completely random. If the record set keep 100 and skip 10, for example, 100 rows are kept in the record set and the next 10 are skipped, then the next 100 are kept and 10 skipped, repeatedly until the end of the dataset. It is useful for orderly records such as time series.

Script filterScript rules Method for creating record sets based on scripting rules. Rules can be scripted in Javascript, Python or R.


Example of record set: 

How to read the results: 

Initially there are 15867 rows or records in the data set.

After filtering rows to keep only >= 250, there are 15830.

Finally, after filtering rows to keep only <= 550, there are 15781 rows.


Example of record set using a Script filter

You can add a rule using a script filter. Select the language you are going to use, there are three options : Javascript, R and Phyton. Write the script in the area. 


Code Block
languagejs
titleRecord set script filter example
linenumberstrue
collapsetrue
val("variable1") <1000 || (  val("variable2") >=80 &&  val("variable3") == "ON"   )  
    /* Value of variable1 < 1000 or [ value of variable2 >= 80 and value of symbolic variable3  equal My_Symbol ] */ 


Info
titlePercentage

For the rules First, Last and Random it is possible to select the percentage of the dataset as well as the number of records (Rows). 

For example, “random percentage 75” will randomly select 75%  of the data set. 

Tip
titleFind your Record sets

Find your record sets more easily with a new filter options.


Tip
titleOutliers

Depending on the type of analysis and type of outliers, they may or may not need to be removed. For example, if the outlier represents measurement error, it is best to remove. Alternatively, if the outlier represents process upsets that you would like to investigate, they should be left in the data set. 

Remove outliers by defining data filtering rules with “Record Sets”. 


What is a Variable?

A variable is a property or characteristic of a record (for example, the weight of a mechanical piece, the time at which an event occurred or the eye color of a person) that varies from record to record.

  • numerical: its value is an integer or real number. Such values can obviously be numerically ordered and compared.
  • symbolic: its value is a string or symbol. It is qualitative and generally cannot be ordered (except for symbolic variables such that low medium high implying an intuitive order).

Variable Sets
Anchor
Variable Sets
Variable Sets

To build models, you may decide to first select a set of variables to use as inputs to the modeling methods. These variables are usually called candidate variables.

You may also want to create subsets of variables to represent specific groups of variables (for example, ambient physical characteristics, process parameters, or quality-related variables).

To create a variable set:

  1. Click Select > Variable set in the menu.
  2. Change the data source, if required.
  3. Enter a name for the new variable set, for example: My Candidates.
  4. Select variables from the list, and click the arrow to add them to the set.

  5. Click Save.
Tip
titleSelecting Multiples

To select multiple variables from the Variable List, use Shift+Click for adjoining variables, and Ctrl+Click to include singles.

Edit a Variable set

To access the Variable Set editor:

  1. Click Variable Sets on the sidebar to view a list of the saved variable sets.
  2. Click the edit icon () for the variable set you want to edit.
Tip

If you want to see what's in a Variable Set and Record Set, you can click on "Reports>Data Export" and select the variables and apply a “Record Set” to filter the data. In this way, you will see a list with all the values from your selection, filtered by your “Record set”.

Classify Variables

You may want to classify the variables to describe if a variable is manipulable or a disturbance, measure or a set point, reliable or unreliable. 

Classifiers pre-defined in DATAmaestro Analytics are:

  • Parameters: characteristic that classifies a type variable in a dataset. E.g.: temperature, pressure, flow, etc.
  • Location: classifies a place or equipment. E.g.: Plant 1, etc.
  • Signal type: defines the type of signal. E.g.: measurement, setpoint , specification, etc.
  • Classification: defines a category for the variable. E.g.: manipulable, disturbance, output, etc.
  • Frequency: defines the rate the variable is collected. E.g.: minute, seconds, etc.
  • Accuracy: classifies the precision of the variable. E.g.: 0-1%, reliable, unreliable, etc.
  • Min/Max: define the minimum and maximum values of the variable.

To classify variables:

  1. Click Select > Classify variables in the menu.
  2. In variable list enter information for the different variable, for example, in Title: My Title, in Classification : Manipulable
  3. Click Save.

To Add, Edit, Hide and Remove Classifiers (directly from the column header): 

It is possible to add, edit and remove a classifier directly from the column header:

  1. To edit a classifier Name, click on the header (it turns light green), edit the Name and click “Enter”. 
  2. To hide a classifier, you can click on - , directly from the header.
  3. To remove a classifier, click on x.
  4. To add a New classifier, click on +  (last column). In tab, New Custom Classifier, enter Name, Description (Optional) and Values. Click “Add”.
  5. Hidden Classifiers tab, is where stand the classifiers that were hidden. In case, you want to display them click on the checkbox. Click “Add”.


To Add, Edit, Hide and Remove Classifiers (using "Edit classifiers"):

  1. Click on “Edit Classifiers”.
  2. The “Edit Classifiers” windows is composed of 5 columns allowing to modify classifiers set:
    1. Change the Name of a classifier
    2. Change the Description of a classifier
    3. Change the predefined Values that a classifier can take
    4. Hide a classifier
    5. Delete a classifier
  3. The “+ Add Classifier” icon allows to create a new classifier. A line appears at the end of the list.


In Script tab: 

  1. Select the Language to be used to write the script. There are three options: Javascript, R and Phyton. 
  2. Write the script in the area. 
  3. Click Run to launch the script. The message Done, appears beside the Run button, once the script is finished. If there are errors at the script, you may find an error message is this area. Note that the script in not saved
  4. Check the results in the Classify tab. 



Code Block
languagejs
titleClassify variables
linenumberstrue
collapsetrue
In this example, the variables that contains "Temp" in their variable names is classified as  "Temperature". Note that the variable name is case sensitive, only those with "Temp" (capital T) are classified and those with "temp" (small t) are not. 
It is also possible to replace "contains" with "startsWith". 

var attributes = inputs.attributes;
var result = output.createArrayResult('attributes');
for (var i=0; i < attributes.length; i++)
{
   var attribute = attributes[i];
   if (attribute.name.contains('Temperature') || attribute.name.contains('temperature'))
   {
      var resultAttribute = {id:i};
      resultAttribute.classifiers = {Parameters:'Temperature'};
      result.push(resultAttribute);
   }
}


...