English us

Records and variables are at the core of data management, this section covers the definition and selection of records and/or variables and their organization in subsets.

Data Basics

The rows and columns identified in the data source you upload provide the building blocks for your project. The columns in the file become variables, for example, Date, Gas Production, Valve1, Valve2, or Temperature. All the variables in the data source are available to use individually, but not all of them are relevant for every analysis. To maximize some features in DATAmaestro, you can create Variables sets. A Variables set is simply a set of columns (variables) that you select from all of the available variables in the data source.

Each row in the data source is considered a record. A single data source can have hundreds, or even millions of records. To use the data effectively, operators are used to define rules, for example, a time frame, values greater than X, or not equal to Y. Each set of rules you create and name is called a Record set. One project can have many Record sets to target different types of data analysis.

In DATAmaestro you can also write your own formula expressions to define a Function Variable. For example, the Function editor lets you:

transform an output variable type from numerical to symbolic, or inversely
compute the ratio function of two explicit variables, such as pressure and temperature
calculate an average for similar variables, such as three temperatures measured on the same equipment.

About Data Sources

After you upload a file to the server and load it, the data source is dynamically linked to your project. Every time a data source project item is updated, all the dependent building blocks are updated automatically, for example, sets, graphics, and models.

Info

title	Data File Protection

Your data source file is never altered by the modifications you make to your DATAmaestro project. For example, if you change the name of a variable when you load the CSV file, it is saved as project information and does not change the data source file itself.

All the data sources you upload and project information you create is secure and saved in a database. Only authorized users have access.

Info

title	Multiple Data Sources

A project can have one data source file, or many data sources. However, data from different data sources cannot be shared or merged for analysis inside DATAmaestro Analytics. If you want to combine data from two different sources, consider using DATAmaestro Lake to merge the data and then uploading the new file for your project.

When you create a new project, you will reach a new page requesting you to select a datasource.

There are two main ways to upload data to DATAmaestro and one way to select existing data.

Quick-start upload: If you have one CSV file ready for analysis, upload directly to DATAmaestro Analytics and start working. It is the same method as for upload data in the DMLake.
Upload to Lake: If you have multiple CSV files to merge or one CSV file to resample, upload to DATAmaestro Lake.
Select from Lake: If you have already uploaded data to DATAmaestro Lake, select it here.

Quick-start Upload

If you have one CSV file ready for analysis, upload directly to DATAmaestro Analytics and start working. It is the same method as for upload data in the DMLake.

Quick Upload [1 / 2] : Select your data

Click on Choose File and select the .csv file from your disk.
While uploading data from CSV, the software automatically detects the file format (Column Delimiter and Number Format) and provides a preview of the file content and how it is read by the system.
Check the Column Delimiter and Number format to make sure they are the same as the uploaded file.
Once a file has been selected, the CSV Preview attempts to interpret your data (Date columns, numerical or symbolic variables), a different color is assigned to the column depending on the data type. Hover over the “eye” to display the original value below.

Check the number of header rows. If there are 3 headers then click on + Add Header Row to add Row 2 and Row 3. It is possible to define headers as name, title, unit, description, classifier or skip a row (used when you don't want to upload a row). We can will define Row 1 as Name, Row 2 as Title and Row 3 as Unit.

Info

title	Classifiers

Classifiers are metadata saved to a data file (DMFF or CSV). It classifies variables according to different categories. Categories include type (symbolic/numerical), Parameter, Location, etc. For example, the variable “Steel plate thickness” can be classified as a Parameter defined as a “Dimension” and the variable “Chemical type” can be classified as a Parameter defined as a “Chemical”. For more information please check Classify Variables.

Check if the characters are displaying correctly. If there is an odd symbol, click on Characters not displaying correctly? and try each Charset until the character displays correctly. The options are: UTF-8, ISO-8859-1, windows -1252 and Mac Roman, if you need more information about the different charsets, please check charset.
If you want to search for a variable, enter its name in Filter variables field.
Click Next.

Quick Upload [2 / 2] : Define and verify variables as time, text or numbers

The system should detect columns as Time, Numerical or Symbolic variables. Check that columns have been correctly identified.
For each columns you can change the headers and variable type by clicking on it.
Click on the first column to see and edit the additional information regarding this column (Name, Title, Type, Time Format, Time Zone). If the information is not correctly detected, please manually correct it.
Click Upload to, the file will be uploaded to the DM project. You will see that the file name is a DMFF file.

Upload to Lake

If you have multiple CSV files to merge or one CSV file to resample, upload to DATAmaestro Lake. It is equivalent to upload data directly in DATAmaestro Lake, , for more information please check DATAmaestro Lake - Upload.

Select from Lake

If you have already uploaded data to DATAmaestro Lake, select it here. It is equivalent to export data out of the DATAmaestro Lake, for more information about export from Lake, please check DATAmaestro Lake - Export.

Click on Data and then Select from Lake.
In Properties tab, Select From and To dates. Make sure the date corresponds to the period of the CSV file.
Check the Interval box. That correspond how often the data is being collected. E.g., 1min, 30 min, 1h.
Search and select in a folder the tags to be visualized. Data from different directories can be merged into a single export file. To help select tags, there is a text search and filter option above the tag list (Name, Title and Unit).
Click on the small arrow to the right of the text search area to see the total number of tags, the number of tags selected, and the number of tags filtered.

Select the Method to be used, for more information about the Methods please check DATAmaestro Lake - Export. It is also possible to select different methods for one same export file. You can change the method and then select the variable(s). The selected method is displayed beside the variable. You will find more information about this topic in the next slides.
Select the tags to be exported. Once your tags are selected, click on the white arrow to move the tags to the column on the right. There your can verify all tags ready to be exported.
To remove tags from the selected tags list, click on the tag(s) you would like to remove and then click the left arrow.
In Var. naming, select the way the tag name should be exported, the default option is Tag Name Only but, it is also possible to export them adding the Subpath (sub/tag) or the Full Path(/project/sub/tag).
In File Name, by default, a name will be included. You may change this name.
Then, click Save.

Info

title	Combinig methods

To select different methods for different variables, first select the method, then select the variables that should use that method and move them across to the right-hand side. Change the method and repeat the variable selection process. Alternatively, the method can be changed for each tag individually. From the list of selected tags. click on the method name beside the variable and you will see a list with all methods. Select the method from the list and then you will see that the selected tag’s method has changed.

A window appears confirming the size of the data file and it asks if you want to proceed. Click Extract.
When tags are finally extracted, click on Load data in. Your data is in DATAmaestro Analytics!
You will be directed to Classify. For more information on how to classify a parameter, please check DATAmaestro Analytics – Classify Variables.
Optional: If you click on Parameters tab, you can click on Retrieve and then check all variables types, titles and units. You can also edit variable names, title and units here.
Click Save.

Info

title	Export data from Lake

You can export data to a DATAmaestro Analytics project directly from DATAmaestro Lake. However, the main advantage in selecting data from Lake in DATAmaestro Analytics is that it will be saved directly in your current project.

Info

title	Edit data in Analytics

Now that you have a data source in your project, you will learn how to edit your data extraction. Edit your data extraction to: add/remove new tags, update time period, change sampling, etc.

Edit data extraction

Your data extraction is stored in your Data Sources area. If you need to edit your selection, time period or add new tags to your Project, on the left bar, click on the Edit icon beside Data Extraction.
In the selection area, you will find the Lake folder and subfolders. You can navigate to different folders to select variables. Select the new tags to be exported to the data source. Once you have selected your tags, click on the arrow to move the tags to the column on the right. There your can verify if all new tags are ready to be exported. Remember that if the new tags need to be exported using a different method, first select the method and then the tags.
Click Save.

Update data extraction

When tags are extracted, you will be redirected to this page. Here there are two options:
To replace the existing datasource (all tasks will be automatically updated): Click on Load data in without changing the selected file in the drop down (Recommended).
To add a new datasource to your project, select in the drop down New datasource. It is possible to work with multiple datasources in a project, however each task can only use one datasource (no merging provided).

To replace the existing datasource:

Replacing will automatically update all tasks within the project that use that datasource.
Choose the datasource name. NB: If file name already exists, the file will be overwritten.
Confirm the steps by clicking ”Extract and replace data in existing datasource”.

Info

title	Warning

Note on DMFF file the will replace an existant DMFF both with the same name. This means that if the same file is used in two different Analytics projects (for example, if you create a copy of the project) and if the DMFF is updated in only one project, but both projects have the same file name, the DMFF file will be automatically updated in the other project too. Therefore, this new DMFF will overwrite the existing one.

If you want to avoid overwriting, upload the new DMFF with a different name or change the name when extracting from the Lake.

Look into your dataset

To open you file (table), click on Data icon on the left vertical bar.
Check the Data tab, where the sampled data is found. Scroll to the right to check all variables.
Check the Summary tab, where basic statistics can be found at a glance. Scroll down to check all variables.

Filter the values in Summary tab and create a new variable set

To open the drop-down Filter option, click on the Filter icon in the column header. E.g.: find the column “Number of missing” (values) and click on the Filter icon. This way you will create a new variable set containing variables with the number of missing values equals to zero which means, tags that have missing values > 0 are not considered in this set. A drop-down menu appears with a Text Filter.
In the Text Filter box select among the different options. E.g.: “equals” and value “0”. This way you are filtering all values that have missing values equals to zero.
Click OK.

Image Removed

Click the checkbox column to select all tags that have missing values equals to 0. You can select one by one but to select all of them click on the checkbox header.
Click on More Actions. You can select Variable set in Variable Selection. Name this variable set, “Selection of Tags with No Missing values”.

Image Removed

Info

title	Check data integrity

The first check you can do is on the data source Summary Tab, where you can see basic statistics for all variables (average, min, max, st dev, nb missing and nb values) and quickly filter and sort variables. Typically, we do a quick check to remove variables with a high % of missing values and with constant standard deviation.

To open the drop-down Filter option, click on the Filter icon in the column header. E.g.: find the column “Number of missing” (values) and click on the Filter icon. This way you will create a new variable set containing variables with the number of missing values equals to zero which means, tags that have missing values > 0 are not considered in this set. A drop-down menu appears with a Text Filter.
In the Text Filter box select among the different options. E.g.: “equals” and value “0”. This way you are filtering all values that have missing values equals to zero.
Click OK.

Image Added

Click the checkbox column to select all tags that have missing values equals to 0. You can select one by one but to select all of them click on the checkbox header.
Click on More Actions. You can select Variable set in Variable Selection. Name this variable set, “Selection of Tags with No Missing values”.

Image Added

Tip

title	How to check data?

For example, one thing I do to check data, is to plot one parameter on a histogram or a trend.

A second good check is to plot key variables like production rate or key pressures, temperatures, speeds, to look for problematic periods or outliers.

Info

title	White space

An underlined class of a symbolic variable indicates a white space (at the beginning, middle or end of the symbol). When creating Record sets based on symbolic variables with spaces, it is recommended to copy them directly from the Data Summary in order to capture all spaces.

Image Modified

Japanese

データ

オブジェクトと属性はデータ管理の中心的な要素です。このセクションでは、オブジェクトと属性の定義と選択、およびサブセットにおける構成について説明します。

データの基本

アップロードするデータソースで指定された行と列はプロジェクトの基本要素です。ファイルの列が属性になります。たとえば、日付、ガス生産量、バルブ1、バルブ2、温度などです。データソースのすべての属性を個別に使用できますが、すべての属性があらゆる分析で関連性を有しているわけではありません。DATAmaestro の一部の機能を最大化するために、属性セットを作成できます。属性セットは、データソースの使用可能なすべての属性から選択する列 (属性) のセットです。

データソースの各行はレコードと見なされます。単一のデータソースには、数百件、さらには数百万件ものレコードが含まれる場合があります。データを効果的に活用するには、時間範囲、X よりも大きい値、Y と等しくない値といった演算子を使用して、ルールを定義します。作成して名前を付けた各ルールセットは、オブジェクトセットと呼ばれます。1 つのプロジェクトには複数のオブジェクトセットを含め、さまざまな種類のデータ分析を対象にすることができます。

DATAmaestro では、独自の式を作成して、関数属性を定義することもできます。たとえば、関数エディターでは次のことができます。

出力属性型を数値から記号に変換、または記号から数値に変換する
圧力や温度といった 2 つの明示的な属性の比率関数を計算する
同じ装置で測定された 3 つの温度といった類似した属性の平均値を計算する

データソース

ファイルをサーバーにアップロードして読み込んだ後は、データソースが動的にプロジェクトにリンクされます。データソースのプロジェクト項目が更新されるたびに、セット、グラフィックス、モデルなどのすべての依存基本要素が自動的に更新されます。

データファイル保護

DATAmaestro プロジェクトを変更しても、データソースファイルは変更されません。たとえば、CSV ファイルを読み込むときに属性の名前を変更した場合は、その変更がプロジェクト情報として保存され、データソースファイル自体は変更されません。

アップロードするすべてのデータソースと作成するプロジェクト情報は保護され、データベースに保存されます。許可されたユーザーのみがアクセスできます。

複数のデータソース

プロジェクトには 1 つのデータソースファイルを割り当てるか、複数のデータソースファイルを割り当てることができます。ただし、異なるデータソースのデータを共有またはマージして、DATAmaestro 内で分析を行うことはできません。2 つの異なるソースのデータを結合する場合は、DATAmaestro Lake を使用して、データをマージしてから、プロジェクトの新しいファイルをアップロードすることを検討してください。

新しいプロジェクトを作成するときには、新しいページが表示され、データソースを選択する必要があります。

DATAmaestro にデータをアップロードするには主に 2 つの方法があります。1 つは既存のデータを選択する方法です。

クイックスタートアップロード: 1 つの CSV ファイルで分析準備が完了した場合、直接 DATAmaestro Analytics にアップロードし、作業を開始します。DMLake でデータをアップロードするのと同じ方法です。
Lake にアップロードする: 複数の CSV ファイルをマージするか、1 つの CSV ファイルをリサンプリングする場合は、DATAmaestro Lake にアップロードします。
Lake から選択する: すでにデータを DATAmaestro Lake にアップロードした場合は、ここでデータを選択します。

クイックスタートアップロード

1 つの CSV ファイルで分析準備が完了した場合、直接 DATAmaestro Analytics にアップロードし、作業を開始します。DMLake でデータをアップロードするのと同じ方法です。

クイックアップロード [1 / 2] : データを選択する

[ファイルの選択] をクリックして、ディスクから .csv ファイルを選択します。
CSV からデータをアップロードしているときには、ファイル形式 (列区切りおよび数値形式) が自動的に検出され、ファイルの内容とシステムでの読み取り方法が表示されます。
[列区切り文字] と[数値形式] をチェックして、アップロードされたファイルと同じことを確認します。
ファイルを選択した後、CSV プレビューはデータ (日付列、数値または記号変数) の解釈を試みます。列にはデータ型に応じて別の色が割り当てられます。「目」の上にカーソルを置くと、元の値が表示されます。
ヘッダー行数を確認します。3 つのヘッダーがある場合は、[+ ヘッダー行の追加] をクリックすると、行 2 と行 3 を追加します。ヘッダーは、名前、タイトル、単位、説明、分類子、行のスキップ (行をアップロードしないときに使用) として定義できます。行 1 を名前、行 2 をタイトル、行 3 を単位として定義できます。

分類子

分類子はデータファイル (DMFF または CSV) に保存されるメタデータです。さまざまなカテゴリに従って属性を分類します。カテゴリには型 (記号/数値)、パラメーター、場所などがあります。たとえば、変数「Steel plate thickness」は「Dimension」として定義されたパラメーターに分類できます。変数「Chemical type」は「Chemical」として定義されたパラメーターに分類できます。詳細については、「属性の分類」を確認してください。

文字が正しく表示されているかどうかを確認します。文字化けしている場合は、[文字が正しく表示されない] をクリックして、文字が正しく表示されるまで各文字セットをためしてください。オプション: UTF-8、ISO-8859-1、Windows -1252、Mac Roman。さまざまな文字セットの詳細については、「文字セット」を参照してください。
属性を検索する場合は、[フィルター属性] フィールドに名前を入力します。
[次へ] をクリックします。

クイックアップロード [2 / 2] : 変数を時刻、テキスト、または数値として定義して検証する

列は時刻、数値、記号変数として検出されます。列が正しく特定されていることを確認してください。
各列をクリックすると、ヘッダーと変数型を変更できます。
最初の列をクリックすると、この列に関する詳細情報 (名前、タイトル、型、時刻形式、タイムゾーン) が表示され、編集できます。情報が正しく検出されない場合は、手動で修正してください。
[アップロード先] をクリックします。ファイルが DM プロジェクトにアップロードされます。ファイル名が DMFF ファイルであることがわかります。

Lake にアップロードする

複数の CSV ファイルをマージするか、1 つの CSV ファイルをリサンプリングする場合は、DATAmaestro Lake にアップロードします。直接 DATAmaestro Lake にデータをアップロードするのと同じです。詳細については、「DATAmaestro Lake - アップロード」を参照してください。

Lake から選択する

すでにデータを DATAmaestro Lake にアップロードした場合は、ここでデータを選択します。Lake からデータをエクスポートするのと同じです。詳細については、「DATAmaestro Lake - エクスポート」を参照してください。

Versions Compared

Old Version 88

New Version 89

Key

Data Basics

About Data Sources

Quick-start Upload

Upload to Lake

Select from Lake

Edit data extraction

Update data extraction

Look into your dataset

データ

データの基本

Page Comparison

Versions Compared

Old Version 88

New Version 89

Key

Data Basics

About Data Sources

Quick-start Upload

Upload to Lake

Select from Lake

Edit data extraction

Update data extraction

Look into your dataset

データ

データの基本