Page Comparison

...

English us

DmA - DATAmaestro Analytics

Question

Answer

General

What is the easiest way to analyse multiple data sources?

Use the function in Microsoft Excel to merge your data, and then upload the combined file using the DATAmaestro Add-In that you can install for Excel. For more information, contact 70457104.

How can I rename a variable?

To rename a variable:

Open your project and click the Data sources icon in the sidebar.
Open the data source file that contains the variable and click Edit.
Rename the variable in the Variable definition section.
Click Load to update the variable name used throughout the project.

Where can I learn more about Lisp programming?

Lisp is an established programming language that is simple to learn. To learn the basics, you can visit LISP Tutorial or LISP for Beginners. For help writing specific expressions for your project, contact 70457104.

How can I rename my project?

There isn't a function to rename a project in DATAmaestro, but you can make a copy of your project, give it a new name and delete the original one.

What happens when I upload a data file that has the same name as an existing data source?

To avoid file confusion, DATAmaestro will prompt you to provide a new file name. Use your own naming convention to be able to distinguish your files.

How can I share my DATAmaestro project with colleagues?

Online collaboration is the most efficient way to share the same project workflow. To share specific project findings, you can also export any of the visualizations to PNG format.

Where can I learn more about algorithms?

Wikipedia is a great resource for information about many of the powerful algorithms used in DATAmaestro. You can search online to explore a topic and expand your understanding.

How can I create a folder?

This functionality is restricted to platform administrators. Please contact your local administrator or Pepite support team with authorisation to create a folder.

How do I change folders?

To navigate between folders:
- From within a project: click the DATAmaestro logo to return to the home page.
- From the home page, select the “eject” button at the top of the list of projects to view all available folders.
To move tasks/projects to another folder:
- This functionality is restricted to platform administrators. Please contact your local administrator or Pepite support team with authorisation to change a folder.

How do I create a new project?

There are two options to create a new project:

In the main DmA page, enter the New project name in the Create new project field.
In the project main menu select Project and then, New Project. The page is redirected to the main DmA page. Enter the new project name in the Create new project field.

How do I open a project?

There are two options to open a project:

In the main DmA page search a project in an existing folders in the Open an existing project.
In a project, select Project from the top menu bar and then, Open Project. The page is redirected to the main DmA page.

How do I copy a project?

In the project main menu select Project and enter Make project copy. The new project’s name can be edited (optional), then click Ok.

How do I delete a project?

In the project main menu select Project and then enter Delete project.

How do I go back on an action/undo an action?

All actions are live within DATAmaestro, however there are several options:

The standard browser forward/back buttons allow navigation between saved tasks.
While editing a task, before clicking save/view/run, the user may click back or click away from the editor screen to avoid saving changes.
Contact the Pepite support team to recover a backup.

I have accidentally deleted my project/work: how do I get it back?

Please contact the Pepite support team for more information.

What is the recommended resolution for best interface display in Google Chrome?

1680 x1050

How can I change colors in DATAmaestro?

You can associate a color with a symbol and customize the color to be used for a Symbol. Otherwise, it is automatically selected by the browser.

On the top menu, in DATAmaestro Analytics project, click on “Project” then “Preferences”.
Click on “Add Entry”.
Enter “Symbol” name. Note: Be careful not to type a space (Enter) just after the symbol name).
Click on color bar to choose a color (Basic or Custom colors).
Click “Save”.

Data

What file formats can I upload to DmA?

DmA accepts all Comma-Separated Values (.csv) file formats.

How do I save my file to .csv format?

From within Excel, select Save As and from the list of FIle format options select “CSV UTF-8 (Comma Delimited) (.csv)” (recommended) or “Comma Separated Values (.csv)”

What is the maximum file size that I can upload to DmA?

There is no file size limit.

What happens when I change my data source?

When a data source is changed or updated in DmA, all tasks (functions, graphs, models, etc.) will automatically be updated based on the new data source.

Can I analyse two data sets/sources together?

No, although you can have more than one datasource per project it is not possible to analyse them together.

What is a .dmff file?

DATAmaestro File Format (DMFF) is a custom file format for all DATAmaestro applications, designed especially to accelerate big data analytics.

How do I upload a file?

There are several options to upload a file:

To upload data directly into DATAmaestro Analytics (data directly ready for analysis), in the DmA main menu select the Data menu and then select Upload file. For more information, refer to the DmA User Guide.
To upload data into DATAmaestro Lake (for merging data or data aggregation), refer to DmL User Guide Import.
Using an Excel plugin. More information please contact Pepite support team.

What kind of file can I upload?

It is possible to upload CSV and DMFF files in DmA.

Can I have symbolic/textual variables?

Yes, it is possible to have symbolic variables in the uploaded file.

Could I have several CSV/DMFF data files within the same DATAmaestro project?

Indeed, a DATAmaestro project is not limited to only one CSV/DMFF data file. In order to add a CSV data file to your current project, use the CSV File option in Data menu. Note that the CSV data file must be present on the DATAmaestro server to use this option. If it is not the case, use first the Upload file option to upload the file (in the Data menu) and then the CSV File option. Important remark: each CSV data file has its own records list so having several CSV data files within your DATAmaestro project doesn’t allow you to merge the data files (for instance, drawing a Scatter Plot graph while taking for X & Y axes variables coming from different CSV data files).

Functions

Can I write equations/formula/functions?

Yes, use the Function variable editor to write equations using the languages, Javascript, Python and R. DmA gives access to IF97 library. Access the function variable editor from the Transform menu in the top menu bar.

My KPI variable in my database is a continuous variable and I would like to transform it into a 3-class variable (Quality levels: low medium high) using classes thresholds well known in my business domain. Must this transformation be done when preparing the CSV data file or can I achieve this transformation task directly in DATAmaestro?

Yes, you can do it directly in your DATAmaestro project. The Discretized variable option (in the Transform menu) allows you to create a symbolic representation of a numerical KPI as you desire. Give first a name to your new variable and then input the thresholds and corresponding category names. For more information, refer to the DmA User’s Guide Transform.

Variables and Records

What is a Variable?

Formerly known as an Attribute - Refer to the DmA User Guide Select for more information.

What is a Record?

Formerly known as an Object - Refer to the DmA User Guide Select for more information.

What is a Variable Set?

Formerly known as an Attribute Set- Refer to the DmA User Guide Select for more information.

What is a Record set?

Formerly known as an Object Set - Refer to the DmA User Guide Select for more information.

Visualization

How can I visualise my symbolic variables?

Visualise symbolic variables on the X axis of histograms or by adding them as a condition to histograms or scatter plots.

Can I draw a histogram of either symbolic or numerical variables?

Yes, a histogram can handle symbolic and numerical variables on both the X axis and as the conditional variable.

How can I see my symbolic variables as a Pareto rather than a histogram?

In the histogram properties choose Pareto as the Plot type.

Can I create a report?

Yes, it is possible to create a report. In the main menu, select Explore and choose Standard Report. In the editor you can create a Trend and several scatter plots. For more information check DmA User’s Guide - Standard Report .

What is “cond.”?

It stands for Condition and it is the number of buckets for the condition or number of discretized groups for a conditional variable.

How do I filter the data?

Data can be filtered by creating record sets. Refer to the DmA User’s Guide Select for more information on creating Record sets.

I know the normal variation range of a numerical variable and I would like to use these range limits to filter the values for this variable in the database, how can I do this in DATAmaestro?

The easiest way to do this is to create a new record set (through the Record set item in the Select menu) that will contain the filtered values according to the accepted variation range. Give first a name to this new record set, click then on the Add button and select the Filter instruction in the scrolling list. Select the variable whose values must be filtered and input the range limit information. Note that for a range interval, you must use twice this Filter instruction.

The date does not appear correctly : how come?

Edit the data source, locate the date/time variable and in the Units column enter one of the following according to the temporal units:

excelTime
millisUnixTime
unixTime

How can I compare the temporal dynamic behavior/trend of two or more numerical variables of my database?

You can do this using the Trends option in Explore menu. Keep in mind that the X axis must be a time related variable available in your database (typically, a numerical timestamp variable or an index variable).
The Trends feature allows multiple variables on the first or second Y axes.

Dendrogram

What is a negative (red) correlation?

A negative correlation or inverse correlation indicates the degree to which one variable increases as the other decreases.

What is a positive (green) correlation?

A positive correlation or direct correlation indicates the degree to which the variables increase or decrease in parallel.

Models

How do I choose the best model for my analysis?

The type of model depends on the analytics objective:

Supervised or unsupervised learning (Prediction, exploration, monitoring)
Type of input parameters (numerical or symbolic)
For supervised learning, the type of output function to predict (regression or classification)

For more information, please contact the Pepite support team.

Is there a limit to the numbers of models I can create?

No, there is no technical limit and multiple models can be created in parallel. Use the More Actions > Clone As function to make a copy of a model.

When do I use a classification model?

A classification model should be used when the output variable (or goal variable) is a symbolic variable (discrete values) eg. High-Medium-Low, On-Off, Good-Bad.

When do I use a regression model?

A regression model should be used when the output variable (or goal variable) is a numerical variable (continuous values).

How can I be sure that my model is reliable?

There are several steps to take to ensure a reliable model. Create independent learning and testing sets, evaluate the mean square error and R2 values, visualise the prediction versus the output or goal variable. For more information, contact the Pepite support team.

How do I choose the model parameters?

Please check DmA User’s Guide for more information.

How can I assess in DATAmaestro the dependence level between the variables of my database?

Several tools are available in DATAmaestro that may help:

Dendrogram tool (via the Explore menu): This tool will produce both the Dendrogram view (a hierarchical clustering tree view) and the Correlation matrix (a table giving all the correlation factor values, one for each pair of variables). Note that this Dendrogram tool only works for numerical variables. With a specific KPI variable in mind, the Dendrogram outputs will teach you which variables seem to be linked to your KPI (that is, the variables that may influence it). Tip: Always confirm/infirm the physical reason of the highest correlation factor values obtained by drawing the Scatter Plot graphs for the corresponding pairs of variables. For more information, refer to the DmA User’s Guide - Dendrograms .
Extra trees tool (found in the Models menu) is an interesting alternative. It produces a Pareto graph of the most significant variables regarding your KPI or output variable. For more information, refer to the DmA User’s Guide Models .

How can I check that my database contains quite homogeneous records (that is, records rendering a quite stable process with only one production regime) or, conversely, heterogeneous records (that is, records rendering transient periods and/or several production regimes)?

DATAmaestro proposes two interesting tools to help do this verification: PCA and K-Means.

The Principal components analysis tool (reached via the Transform menu) highlights the variability present in your database. Creating 2 components and drawing them via a Scatter Plot graph is an easy way to confirm the homogeneous nature of your records (records typically form an unique quite compact cloud of points in the Scatter Plot) or, conversely, the heterogeneous nature of your records (records form a quite extended cloud of points and/or form distinct clouds of points). For more information, refer to the DmA User’s Guide - Transform.
The K-Means tool (in the Models menu) tries to split your database into several (2, 3 or more) groups of homogeneous points named clusters. For more information, refer to the DmA User’s Guide - Clustering models .
Note that both PCA and K-Means only work for numerical variables.

How can I export my models?

Once the model is created you can select Export function in More Actions. Note that Export function generates a function to be integrated in external environments, such as Excel reports. Not all models have this feature. For more information please check the DmA User’s Guide Models.

Can I add “constraints” or rules to my decision tree based on certain parameter constraints?

Users can edit the tree, force decisions to node or vice versa.

Export the function and adapt the rules based on constraints.
Create a record set to remove data not satisfying constraints.
Remember this Decision Tree is based on historical data so only covers events that have really occurred, therefore, if constraints always existed, the data should not differ from those constraints.

Exports

In my DATAmaestro project, I have applied lots of transformations to my database (outlying and abnormal records were filtered, new variables were created, interesting predictive models were built, etc.). I would like to export a subset of this transformed database for reporting needs. Is it possible?

Absolutely, you can export the data subset of interest using the Export data in the Reports menu. You only have to select both the record set and the variables that define your wanted subset, then click on View button, and finally click the Download item proposed in the More actions list. The exported data subset is a CSV file you can exploit using Excel or other usual reporting tools.

How can I export my analysis (graphs)?

Once the graph is created you can select Export graphic via the More Actions menu (top right hand corner). There are three formats: SVG, PNG and PDF.

DmL - DATAmaestro Lake

General

What is a tag?

A tag is the name given to a variable, parameter or column within the DATAmaestro Lake and other common historians. It is said that values are recorded for one same tag during a period of time.

Can I calculate functions in DmLake?

Yes, there is an advanced feature for Admin users to calculate “Computed tags”. For more information, contact your system administrator or the Pepite Support team.

What does “View data” do?

The View tab gives access to 3 features that are handy to quickly explore the data in DmLake :

Trends - allows the user to visualise trends of previously.
Raw data - allows the user to view the recorded values of up to 16 different tags. The values displayed corresponding to the lowest frequency of the selected.
Statistics - provides useful statistical information about the data stored in the DmLake, such as the time period a tag has been recorded, the number of recorded values, the number of missing values, mean, min, max, etc.

What is the recommended resolution for best interface display in Google Chrome?

1680 x 1050

Log-in

What do I do if I forgot my password?

Contact the Pepite Support team.

What are the different log-in options (Google, Microsoft, Pepite)?

Pepite account (default): account created by Pepite unique to DATAmaestro. Passwords are managed by Pepite.
Google/Microsoft: account based on an existing email account. The user must be logged into Google/Microsoft in order to access DATAmaestro via this authentication method. Passwords are managed by Google/Microsoft.

How do I create a new account?

To create a new account, contact the Pepite Support team.

How can I reset my password?

Pepite account: From the login page select Pepite Account, enter your username and existing password, then below select “Change Password”. Two fields will appear: New Password and Repeat New Password. Enter your new password in both fields and select login. You have now updated your password.
Google account: Please follow the standard process to update your Google Account or Gmail password. Once completed, login to Gmail or Google Account with this new password and you’ll be able to login to DATAmaestro.
Microsoft account: Please follow the standard process to update your Microsoft password. Once completed, login to Microsoft Account with this new password and you’ll be able to login to DATAmaestro.

My Google account does not work; what do I do?

Please contact the Pepite Support team.

My Pepite account does not work; what do I do?

Please contact the Pepite Support team.

Import

What data file formats can I upload to DmLake?

DmLake accepts Comma-Separated Values (.csv) file formats.

How do I save my file to .csv format?

From within Excel, select Save As and from the list of file format options select “CSV UTF-8 (Comma Delimited) (.csv)” (recommended) or “Comma Separated Values (.csv)”
It is recommended to remove all formula from your Excel sheet before saving as a CSV file.

How do I upload data files to DmLake?

Once your file has been saved in CSV format, connect to the DmLake using your DATAmaestro account (Pepite, Google or Microsoft Account). Then go to the Import menu. For more information, see Import data in the DmLake User's Guide Import.

How do I connect my historian / my datasource to DmLake?

For automatic data streaming from historians / datasources to DmLake, a custom DmCollector must be implemented. Please contact the Pepite Support Team for more information.

In what format does my date column need to be?

The following date formats are currently supported:

Excel time
Excel time for Mac
Unix (s)
Unix time (ms)
Text time (eg. 10/12/2017 10:25)

Note : if you use “Excel time” or “Excel time for mac”, it is recommended to ensure sufficient decimal places are displayed in the cells before savings as a .csv file (12 decimal places recommended) so that the precise and correct time will be uploaded into the DmLake.

What is the maximum file size that I can upload to DmLake?

No, there is no technical file size limit.

Can I merge data in DATAmaestro Lake? For example, I have monthly data for analysis (in different files), how can I consolidate this data into one file?

DATAmaestro Lake is a temporal database which allows data to be consolidated based on date and time. To consolidate data when each file corresponds to a different month, upload the data in chronological order into the same DM Lake folder, ensuring that the variable names match across files. DmLake will ensure that the data from each month is merged into one complete dataset. Refer to the DmLake User Guide for more information on uploading data.

Importing my file does not work; common mistakes to take a look at:

File is saved as a Comma Separated Values (CSV) file
Check your date format - refer to “In what format does my date column need to be?”
Check chronological order - ensure data is ordered in chronological order
Activate “skip existing values” in case you could not upload data in chronological order

What is the charset?

Charset or Character Set or Character Encoding indicates to the web browser which text format to use, in order to display an HTML page correctly.

What is the delimiter?

The delimiter is the symbol used to separate columns in a .csv file. Generally, a comma is used, however other common delimiters include semicolons, forward-slash, etc.

What is the difference between “Europe/Brussels” time zone and “+02:00” time zone?

Europe/Brussels is an example of a Region time zone. This format should be used when your data comes from a system that shows the local clock time, including daylight savings changes.
+02:00 is an example of a Fixed Offset. This format should be used when data comes from a system not taking into account daylight savings changes, not displaying the wall-clock time.
NB: These options are equivalent for time zones where there is no daylight savings time changes.

What does the “Skip Existing Values” option do?

If other data (for a given time) has already been uploaded in a certain folder, then a new import in the same folder will skip the existing values (not replacing them).

I have an error “Date saving (complete)”, what do I do?

Activate Skip Existing Values and select load. In this case. existing values will not be reuploaded or replaced.

What is the difference between Append mode and Reset mode?

Append mode will attach new data to the folder (new rows or new columns) without replacing previously uploaded information.
Reset will replace existing information (for any given tag) as well as entering new data.

Example: if you have already uploaded your production for January 2016, then decide to upload production data for all of 2016. The reset mode will first delete the January 2016 data and then upload all production data from 2016.

What is a folder in the Lake?

A DM Lake folder is equivalent to a folder on your computer. To keep things organised, it is recommended to upload your data into directories based on:

1/ Location (Site, Country, Process Unit, Equipment)

2/ Data source (Historian, LIMS, etc.)

3/ Data type (Manual upload, DmCollector automatic data uploads)

What is the difference between a “Base folder” and “New folder”?

A “Base folder” is an existing “folder” within DmLake. A “New folder” simply indicates sub-folder within that “base folder” location.

Can I upload multiple files at once?

No, as exact settings must be provided for each file, it is not possible to upload multiple files at once. For support, please contact the Pepite Support Team.

Trends

Why is the start date blocked?

When the Range is defined (1 week, month, year), only the end date can be modified and the start date adapts to respect the requested Range.

What does method do?

Different methods will slightly change the data visualisation. For example, when average is selected, the data shown represents average values. However, if the method minimum is selected, the data shown represents the minimum recorded values.

How do I add variables to my trends?

On the bottom-left, select the “+ Add trends button” to select variables. For more information refer to the User Guide.

Can I visualise variables from different directories?

Yes, variables from different directories (folders) and even with different sampling rates can be visualised together, providing you have the access rights to those different directories.

Why does my variable have the value “Infinity”?

Infinity indicates that no data is available for the variable during the selected time range. Zoom out or move forwards-backwards in time to see if any data has been recorded for your variable.

How do I zoom out?

Continue to click on the magnifying glass in the bottom-left corner, next to the “+ Add trend button”.

Is it possible to save trend graphs in DmLake?

This will be possible using DATAmaestro Dashboards (DmD). Refer to the DmD User Guide.

Can we save templates of trend graphs in DmLake?

At this stage this functionality is not available. This will be released in a future revision of DmLake.

Raw data

Do I have to provide a Start Date?

No, if no start date is provided, the raw data is displayed from the first data point.

How can I see raw data for variables from different directories?

Select the folder from the drop down list on the left hand editor, select the variables required and then click the right hand arrow to move them to the right hand editor. Repeat for each folder of interest. For more information refer to the DmL User's Guide.

What does the date mean (e.g. +02:00)?

This indicates the timezone (eg. +02:00) of the data. For example, 2017-07-15 10:00:00.000 +0200 indicates 10:00am on the 15th July 2017 in Central Europe.

Statistics

Do I have to select a date range?

No, if no date range is selected, the statistics will cover the entire range of data available for the selected variables.

Why do I have a missing count?

A missing count greater than zero indicates the number of data points that were not recorded. This may occur when data is uploaded with empty cells or when data historians record a missing value.

Export

What is the difference between the snap, average, etc. methods?

The sampling method defines the aggregation (if any) to perform on the data before extracting. The table below outlines each method type:

Method	Description
Snap	An instantaneous value at each time interval will be extracted, no aggregation is provided
Average	Mathematical mean value for each Period or time window
Interpolation	Mathematical interpolation between previous and next values
Most probable	Selects the most frequently recorded value for each Period or time window (useful for symbolic tags)
Minimum	Selects the lowest/minimum recorded value for each Period or time window
Maximum	Selects the highest/maximum recorded value for each Period or time window
Raw	Exports raw values.
Last	Selects the last recorded value at or before a given time.
First	Selects the first recorded value at or after a given time.

For more information, please check Lake - Export.

Why is my start date frozen?

When the Range is defined (1 week, month, year), only the end date can be modified and the start date adapts to respect the requested Range.

Why can I not extract raw data?

This functionality is currently restricted. Contact the Pepite Support Team for more information.

I have data with different frequencies and from different sources : how can I merge them together?

All the data can be uploaded to DmL into the same or different directories.
Then the DmL export function can be used to extract data with different frequencies and from different sources for analysis.

Can I export data from multiple directories?

Yes, variables from different directories (folders) and even with different sampling rates can be exported together, providing you have the access rights to those different directories. The dates, sampling method and period allow the data to be aligned in a simple table format ready for analytics.

What is a .dmff file?

DATAmaestro File Format (DMFF) is a custom file format for all DATAmaestro applications, designed especially to accelerate big data analytics.

Can I export a .csv file?

To export a .csv file, first the data must be exported to DATAmaestro Analytics and then it can be downloaded as a .csv file.

Is there a maximum number of tags that I can extract at once?

No, there is no technical limit.

Is there a limit on the time range that I extract?

No, there is no technical limit.

I don’t have access to a Folder: what do I need to do?

Please contact the Pepite support team with authorisation from the folder owner.

What is the difference between a “folder” and a “project”?

A folder groups a series of projects in Analytics or tags in the Lake, while a project is equivalent to a file within a folder.

How can I delete data?

This functionality is restricted to platform administrators. Please contact your local administrator or Pepite support team with authorisation from the folder owner to delete data.
If you are an administrator, delete data via the Admin -> Tag management menu and refer to the DmL User's Guide Tag Management for more information.

How can I move variables to another folder?

This functionality is restricted. Please contact the Pepite support team with authorisation from the folder owner to move variables.

How many variables can I have per folder?

There is no technical restriction. However, for optimal performance, it is recommended to create sub-directories to organise data, limiting each folder to 2000 - 3000 variables.

How can I set up regular recurring exports (i.e. automated extractions)?

This functionality is restricted to platform administrators. Please contact your local administrator or the Pepite support team.
If you are an administrator, create recurring exports via the Admin -> Exports to DATAmaestro Analytics menu and refer to the /wiki/spaces/DATMA/pages/70660093 for more information.

Can I export to third party platforms?

Several options are available to export to third party platforms (Excel, Python, R, etc.) Please contact the Pepite support team for more information.

What does the “download” button do?

The download button will download a .dmff file of your selected data and date ranges on your local computer drive.

What period should I use?

The period, also know as sampling rate, depends on the frequency of your selected variables. If your selected variables have daily values, a period less than one day (1d) will not be useful. The period also depends on the dynamics and inertia of the process to analyse. If your selected variables are measured every second but the process changes occur slowly over several hours, a period of seconds or minutes will not be appropriate. Select the period that best suits your variables and process dynamics.

What method should I use?

Select a method based on the type of process and analysis to perform. For example, to analyse energy efficiency, typically the average method is selected.

How do I format my period?

There are two options to select the period:

Enter the period value in milliseconds (eg. 5 minutes = 300000)
Enter the short-hand format (eg. 5 minutes = 5m)
Refer to the DmL User's Guide for more information.

...

Versions Compared

Old Version 17

New Version Current

Key