For more information, see the online learning platform

Once the data is loaded from the Lake into your Analytics project, you can select the sampled data with Variable Sets and Record Sets:

Variable Sets: selection of variables (column selection). Once your Variable Set is created, you can use this as a filter each time you must select some variables.
For example, you can create a Variable Set "temperature" with all your temperature variables, so that the temperatures are easier to select when needed.
Record Sets: selection of rows (timestamps or batches) based on some rules (row selection). For example, you can filter the data before a given time or when the production rate is above a given value.

What is a Record?

A record is simply an indexed numerical value that identifies a specific instance, data point, in a database. The identifiers are established based on the row index of the table.

Record Sets

For more information, see the online learning platform

After a file is uploaded into the DATAmaestro environment, you can select operators to define the rules for the record sets you want to create.

A record set does not delete or change the data, simply filters out specific rows for graphs or models.

To create a record set:

Click Select > Record set in the menu.
Change the data source, if required.
Enter a name for the new record set, for example: Clean Data Set.
Click Add to open the rules definition.
Select an operator, see table below, and complete the rule, see Record Set Rules.
Add more rules, as required.
Click Compute.

You cannot reorder the rules applied to the set of records. Try to plan the order before you create a series of rules. To remove a specific rule, click Clear, or Clear all to remove them all.

It can be a good idea to work with “compounding” Record Sets to avoid defining the same rules multiple times in multiple different Record Sets.

Typically, we recommend defining a Record Set for Clean-Steady State operations and using this as a starting point for other Record Sets (use the rule “Intersect” to begin with another record set).

Edit a Record Set

To access the Record Set editor:

Click Record Sets on the sidebar to view a list of the saved record sets.
Click the edit icon () for the record set you want to edit.

A record set is a set of data points, specific instances, records or rows in a database. Record sets can be created based on a series of rules (First, Last, Random, Intersect, Filter, etc) or via rulers on all visualization graphs.

Record Set Rules

Name of Operator	What to Enter	How It Is Used
First	Number	Indicates records selected from the front of the current record set. For example, “First 100” will select the first 100 records (or rows) within the selected data set (or record set if combining record set rules).
Last	Number	Indicates records selected from the end of the current record set. For example, “last 100” will select the last 100 records (or rows) within the selected data set (or record set if combining record set rules).
Random	Number	Indicates records selected randomly. For example, “random 100” will randomly select 100 records (or rows) within the selected data set (or record set if combining record set rules).
Subseq	Numbers	Record set rules that span from row n to row m within the selected data set (or record set if combining record set rules).
Not-in	Record set	Record set rule that excludes all data contained within a specified record set.
Union	Record set	Record set rule that allows the combination of an existing record set with additional rules (or with additional record sets). When combining two (or more) record sets using union, this is equivalent to keeping data points that are in either record set 1 “OR” record set 2.
Intersect	Record set	Record set rule that allows the combination of an existing record set with additional rules (or with additional record sets). When combining two (or more) record sets using intersect, this is equivalent to keeping data points that are in both record set 1 “AND” record set 2.
Filter	Variable, control and number	Method for creating record sets based on filtering a particular variable (numerical or symbolic) based on the given filter rules (less than, greater than, etc.).
Filter missing	Variable set	Create rules that removes records (rows), for a given variable set, that have at least one missing value for the variables. NB: Data sets with high proportions of missing data may result in empty record sets. If a value is missing for one variable, the whole record line will be removed from the selection ( = from the record set) Be aware that such a rule: will remove all rows where "Profit/hr" is below 3000, including any rows where “Profit/hr” is missing and that this impacts all variables (Record sets filter entire rows).
Cyclic	Number	Method for creating record sets that Keep or Skip rows of a dataset. This can be used to create learning and testing sets systematically, not completely random. If the record set keep 100 and skip 10, for example, 100 rows are kept in the record set and the next 10 are skipped, then the next 100 are kept and 10 skipped, repeatedly until the end of the dataset. It is useful for orderly records such as time series.
Script filter	Script rules	Method for creating record sets based on scripting rules. Rules can be scripted in Javascript, Python or R.

Example of record set:

User Documentation > Select > image2022-11-24_14-47-35.png

Click () to clone rule E.g. To duplicate the selected filter for a specific variable and then add a threshold. Drag and drop to change the order of rules ().

User Documentation > Select > image2022-11-24_14-48-31.png

How to read the results:

Initially there are 15867 rows or records in the data set.

After filtering rows to keep only >= 250, there are 15830.

Finally, after filtering rows to keep only <= 550, there are 15781 rows.

Example of record set using a Script filter:

You can add a rule using a script filter. Select the language you are going to use, there are three options : Javascript, R and Phyton. Write the script in the area.

User Documentation > Select > image2021-9-2_8-53-43.png

val("variable1") <1000 || (  val("variable2") >=80 &&  val("variable3") == "ON"   )  
    /* Value of variable1 < 1000 or [ value of variable2 >= 80 and value of symbolic variable3  equal My_Symbol ] */

For the rules First, Last and Random it is possible to select the percentage of the dataset as well as the number of records (Rows).

For example, “random percentage 75” will randomly select 75% of the data set.

User Documentation > Select > Screen Shot 2021-05-11 at 12.12.28.png

Find your record sets more easily with a new filter options.

Depending on the type of analysis and type of outliers, they may or may not need to be removed. For example, if the outlier represents measurement error, it is best to remove. Alternatively, if the outlier represents process upsets that you would like to investigate, they should be left in the data set.

Remove outliers by defining data filtering rules with “Record Sets”.

To create a function from any record set.

Export Record Sets to a function
Set names for points within or outside the Record Set
View the result on charts to understand which points are within the Record Set

User Documentation > Select > Capture training set.PNG

User Documentation > Select > Capture record set.PNG

User Documentation > Select > Capture fv.PNG

What is a Variable?

A variable is a property or characteristic of a record (for example, the weight of a mechanical piece, the time at which an event occurred or the eye color of a person) that varies from record to record.

numerical: its value is an integer or real number. Such values can obviously be numerically ordered and compared.
symbolic: its value is a string or symbol. It is qualitative and generally cannot be ordered (except for symbolic variables such that low medium high implying an intuitive order).

Variable Sets

For more information, see the online learning platform

To build models, you may decide to first select a set of variables to use as inputs to the modeling methods. These variables are usually called candidate variables.

You may also want to create subsets of variables to represent specific groups of variables (for example, ambient physical characteristics, process parameters, or quality-related variables).

To create a variable set:

Click Select > Variable set in the menu.
Change the data source, if required.
Enter a name for the new variable set, for example: My Candidates.
Select variables from the list, and click the arrow to add them to the set.
Click Save.

To select multiple variables from the Variable List, use Shift+Click for adjoining variables, and Ctrl+Click to include singles.

On all Editor pages (Charts, models, etc), user preferences will be saved in the browser including:

Resizing column widths to fit information,
Changing column order in function of importance,
Hiding empty columns.

User Documentation > Select > image2022-11-24_9-50-54.png

When a Variable Set is selected, all variables within that set are highlighted:

One Variable Set has been selected, which contains 22 variables
The 22 variables have been highlighted to indicate that they have been selected via the Variable Set

User Documentation > Select > Capture vs.PNG

Edit a Variable set

To access the Variable Set editor:

Click Variable Sets on the sidebar to view a list of the saved variable sets.
Click the edit icon () for the variable set you want to edit.

If you want to see what's in a Variable Set and Record Set, you can click on "Reports>Data Export" and select the variables and apply a “Record Set” to filter the data. In this way, you will see a list with all the values from your selection, filtered by your “Record set”.

Classify Variables

For more information, see the online learning platform

You may want to classify the variables to describe if a variable is manipulable or a disturbance, measure or a set point, reliable or unreliable. Classifiers keep track of key information about variables.

Classifiers pre-defined in DATAmaestro Analytics are:

Parameters: characteristic that classifies a type variable in a dataset. E.g.: temperature, pressure, flow, etc.
Location: classifies a place or equipment. E.g.: Plant 1, etc.
Signal type: defines the type of signal. E.g.: measurement, setpoint , specification, etc.
Classification: defines a category for the variable. E.g.: manipulable, disturbance, output, etc.
Frequency: defines the rate the variable is collected. E.g.: minute, seconds, etc.
Accuracy: classifies the precision of the variable. E.g.: 0-1%, reliable, unreliable, etc.
Min/Max: define the minimum and maximum values of the variable.

Classifiers are stored with the data source.

To classify variables:

Click Select > Classify variables in the menu.
In variable list enter information for the different variable, for example, in Title: My Title, in Classification : Manipulable.
Select more than one variable and edit them all at once by clicking on Bulk Edit.
Click Save.

To Move, Edit, Hide, Resize and Remove Classifiers (directly from the column header):

It is possible to add, edit and remove a classifier directly from the column header:

To move the column header, pass the mouse over the three vertical points icon and then, with the grabbing hand cursor change the column position. You can drop it after you see a blue line indicating the new position.
To hide a classifier click on - .
To resize a header column, use | .
To remove a classifier, click on x.
To edit a classifier click a cell and select the information.

Remove all the filter by using the "Trash" icon on the top right side of the table.

User Documentation > Select > Capture2.PNG

To Add, Edit, Hide and Remove Classifiers (using "Edit classifiers"):

Click on “Edit Classifiers”.
The “Edit Classifiers” windows is composed of 5 columns allowing to modify classifiers set:
1. Change the Name of a classifier
2. Change the Description of a classifier
3. Change the predefined Values that a classifier can take
4. Hide a classifier
5. Delete a classifier
The “+ Add Classifier” icon allows to create a new classifier. A line appears at the end of the list.

User Documentation > Select > Capture3.PNG

In Script tab:

Select the Language to be used to write the script. There are three options: Javascript, R and Phyton.
Write the script in the area.
Click Run to launch the script. The message Done, appears beside the Run button, once the script is finished. If there are errors at the script, you may find an error message is this area. Note that the script in not saved.
Check the results in the Classify tab.

User Documentation > Select > image2021-9-2_11-24-48.png

User Documentation > Select > image2021-9-2_11-26-41.png

In this example, the variables that contains "Temp" in their variable names is classified as  "Temperature". Note that the variable name is case sensitive, only those with "Temp" (capital T) are classified and those with "temp" (small t) are not. 
It is also possible to replace "contains" with "startsWith". 

var attributes = inputs.attributes;
var result = output.createArrayResult('attributes');
for (var i=0; i < attributes.length; i++)
{
   var attribute = attributes[i];
   if (attribute.name.contains('Temperature') || attribute.name.contains('temperature'))
   {
      var resultAttribute = {id:i};
      resultAttribute.classifiers = {Parameters:'Temperature'};
      result.push(resultAttribute);
   }
}

選択

オブジェクトの概要

オブジェクトはインデックス付きの数値であり、データベースの特定のインスタンス、レコードを特定します。識別子は表の行インデックスに基づいて作成されます。

オブジェクトセット

ファイルが DATAmaestro 環境にアップロードされた後は、演算子を選択して、作成するオブジェクトセットのルールを定義できます。

オブジェクトセット

オブジェクトセットはデータを削除も変更しません。グラフやモデルの特定の行を除外するだけです。

オブジェクトセットを作成する

メニューで [選択] > [オブジェクトセット] をクリックします。
必要に応じてデータソースを変更します。
たとえば次のように新しいオブジェクトセットの名前を入力します。My Clean
[追加] をクリックすると、ルール定義が開きます。
演算子を選択し、次の表を参照して、ルールを入力します。オブジェクトセットルールを参照してください。
必要に応じてその他のルールを追加します。
[算出] をクリックします。

ルールの順序

オブジェクトのセットに適用されるルールの順序を変更することはできません。一連のルールを作成する前に、順序を計画してください。特定のルールを削除するには、[消去] をクリックします。すべてのルールを削除するには、[すべて消去] をクリックします。

オブジェクトセットの編集

オブジェクトセットエディターを起動する方法：

サイドバーで [オブジェクトセット] をクリックすると、保存されたオブジェクトセットのリストが表示されます。
編集するオブジェクトセットの編集アイコン () をクリックします。

オブジェクトセットのツールチップ

オブジェクトセットは、データベースのデータポイント、特定のインスタンス、レコード、行のセットです。オブジェクトセットは一連のルール (First、Last、Random、Intersect、Filter など) に基づいて作成するか、すべての視覚化グラフのルーラーを使用して作成できます。

オブジェクトセットルール

演算子の名前	入力する内容	使用方法
First	数値	現在のオブジェクトセットの先頭から選択されたオブジェクトを示します。たとえば、「First 100」は選択したデータセット (オブジェクトセットルールを結合する場合はオブジェクトセット) 内の最初の 100 件のオブジェクト (または行) を選択します。
Last	数値	現在のオブジェクトセットの最後から選択されたオブジェクトを示します。たとえば、「Last 100」は選択したデータセット (オブジェクトセットルールを結合する場合はオブジェクトセット) 内の最後の 100 件のオブジェクト (または行) を選択します。
Random	数値	ランダムに選択されたオブジェクトを示します。たとえば、「random 100」は選択したデータセット (オブジェクトセットルールを結合する場合はオブジェクトセット) 内の 100 件のオブジェクト (または行) をランダムに選択します。
Subseq	数値	選択したデータセット (オブジェクトセットルールを結合する場合はオブジェクトセット) 内の行 n から行 m までを対象とするオブジェクトセットルール。
Not-in	オブジェクトセット	指定したオブジェクトセット内に含まれるすべてのデータを除外するオブジェクトセットルール。
Union	オブジェクトセット	既存のオブジェクトセットを追加のルール (または追加のオブジェクトセット) と組み合わせることができるオブジェクトセットルール。union を使用して 2 つ (以上) のオブジェクトセットを結合するときには、オブジェクトセット 1「OR」オブジェクトセット 2 のデータポイントを保持することと同義です。
Intersect	オブジェクトセット	既存のオブジェクトセットを追加のルール (または追加のオブジェクトセット) と組み合わせることができるオブジェクトセットルール。Intersect を使用して 2 つ (以上) のオブジェクトセットを結合するときには、オブジェクトセット 1「AND」オブジェクトセット 2 のデータポイントを保持することと同義です。
Filter	属性、コントロール、数値	特定のフィルタールール (未満、より小さいなど) に基づいて、特定の属性 (数値または記号) をフィルタリングすることで、オブジェクトセットを作成する方法。
Filter missing	属性セット	1 つ以上の属性値がない特定の属性セットのオブジェクト (行) を削除するルールを作成します。注記: 欠測データの比率が高いデータセットの場合、空のオブジェクトセットになる可能性があります。
Cyclic	数値	データセットの行を保持またはスキップするオブジェクトセットを作成する方法。この方法を使用すると、完全にはランダムではない、学習およびテストセットを体系的に作成できます。たとえば、オブジェクトセットが 100 件保持、10 件スキップの場合、オブジェクトセットの 100 行が保持され、次の 10 行がスキップされます。その後に、次の 100 件が保持され、10 件がスキップされます。この処理がデータセットの最後まで繰り返されます。時系列などの順番に並んだオブジェクトで有効です。
Script filter	スクリプトルール	スクリプトルールに基づいてオブジェクトセットを作成する方法。ルールは Javascript、Python、または R でスクリプト化できます。

オブジェクトセットの例

User Documentation > Select > recordseteditor.png

User Documentation > Select > cleanedrecordset.png

結果を読み取る方法

最初は、データセットに 15854 行またはオブジェクトがあります。

行をフィルタリングした後は、 >= 250 行のみを保持し、15827 行です。

最後に、行をフィルタリングした後は、 <= 550 のみを保持し、15778 行です。

Percentage

First、Last、Random ルールでは、データセットの割合とオブジェクト (行) 数を選択できます。

たとえば、「random percentage 75」はデータセットの 75% をランダムに選択します。

User Documentation > Select > percentagerecordset.png

属性の概要

属性はオブジェクトのプロパティまたは特性です (たとえば、機械部品の重量、イベント発生時刻、人間の目の色)。これはオブジェクトごとに異なります。

数値: 値は整数または実数です。当然、このような値は数値順に並べ、比較できます。
記号: 値は文字列または記号です。これは定性的であり、一般的には並べ替ることができません (直感的な順序を示す低中高などの記号属性を除く)。

属性セット

モデルを構築するには、まずモデリング方法への入力として使用する属性のセットを選択する必要があります。通常、これらの属性は候補属性と呼ばれます。

属性のサブセットを作成して、変数の特定のグループを表すこともできます (空気物理特性、プロセスパラメーター、品質関連変数など)。

属性セットの作成

メニューで [選択] > [属性セット] をクリックします。
必要に応じてデータソースを変更します。
たとえば次のように新しい属性セットの名前を入力します。My Candidates
リストから属性を選択し、矢印をクリックしてセットに追加します。
[保存] をクリックします。

複数選択

属性リストから複数の属性を選択するときには、隣接する属性を選択するには、Shift キーを押しながら属性をクリックします。属性を個別に選択するには、Ctrl キーを押しながら属性をクリックします。

属性セットの編集

属性セットエディターを起動する方法：

サイドバーで [属性セット] をクリックすると、保存された属性セットのリストが表示されます。
編集する属性セットの編集アイコン () をクリックします。

属性の分類

属性を分類し、属性が操作可能か撹乱か、基準か設定値か、信頼できるか信頼できないかどうかを説明することができます。

DATAmaestro Analytics であらかじめ定義されている分類子:

パラメーター: データセットの型変数を分類する特性。例: 温度、圧力、流量など。
場所: 場所や装置を分類します。例: プラント 1 など。
シグナルタイプ: シグナルのタイプを定義します。例: 基準、設定値、仕様など。
分類: 変数のカテゴリを定義します。例: 操作可能、撹乱、出力など。
頻度: 変数が収集されるレートを定義します。例: 分、秒など。
精度: 変数の精度を分類します。例: 0-1%、信頼できる、信頼できないなど。
最小値/最大値: 変数の最小値と最大値を定義します。

属性を分類する方法:

メニューで [選択] > [属性の分類] をクリックします。
属性リストで別の属性の情報を入力します。たとえば、タイトルにはMy Title、分類にはManipulable と入力します。
[保存] をクリックします。

(列ヘッダーから直接) 分類子を追加、編集、非表示、削除する方法：

列ヘッダーから直接、分類子を追加、編集、削除できます。

分類子名を編集するには、ヘッダーをクリック (緑色になります) して、名前を編集し、「Enter」キーを押します。
分類子を非表示にするには、直接ヘッダーから - をクリックします。
分類子を削除するには、x をクリックします。
新しい分類子を追加するには、+ をクリックします (最後の列)。[新しいカスタム分類子] タブで名前、説明 (任意)、値を入力します。[追加] をクリックします。
[非表示の分類子] タブには非表示にされた分類子が表示されます。この場合、チェックボックスをオンにすると表示できます。[追加] をクリックします。

User Documentation > Select > editordmffparameter.png

User Documentation > Select > addclassifier.png

([分類子の編集] を使用して) 分類子を追加、編集、非表示、削除する方法：

[分類子の編集] をクリックします。
[分類子の編集] ウィンドウには 5 つの列があり、分類子セットを変更できます。

分類子の名前を変更する
分類子の説明を変更する
分類子が取れる定義済みの値を変更する
分類子を非表示にする
分類子を削除する

[+ 分類子の追加] アイコンを使用すると、新しい分類子を作成できます。リストの最後に行が表示されます。

User Documentation > Select > edit classifier.png

User Documentation > Select > addclassifier2.png