Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

English us

The Quick Start section is designed to help you get up and running fast. It describes how to create a simple project and use the core tools and functionality to secure a basic understanding.

Create a New Project

The data mining process is a complex sequence of tasks ranging from data selection and exploration to knowledge extraction and reporting. In DATAmaestro, this process takes place in the context of a project.

To start the process of mining and analyzing data, you must first create a new project and specify the data source. The project is saved inside a project model, which allows you to save various models, graphical items, data sets and variables sets.

To create a new project:

  1. Enter your sign in information and click Sign In.
  2. On the DATAmaestro welcome page, enter a name for your project.
  3. The name of your New Project appears in the top left corner of the home page.

Info
titleWhat is a project?

In DATAmaestro, a project is any analytical investigation conducted against data source(s). Project names often represent an area of study, for example, the name of an industrial plant or a division within it.

Use your own naming convention when you create new projects. After you create a project, you can make a copy of it and reuse the elements for a new project.

Upload a Data Source File

Data files must be uploaded to the DATAmaestro server before you can load them in your DATAmaestro project. You can assign one or more data source files to a project, but information cannot be shared between files.

CSV format or Excel data source files are valid uploads for your projects. Ensure your data source file is accurate and well formatted - your project depends on it. Anomalies and missing data can affect the output.

Tip
titleMicrosoft Excel add-in

To upload Excel files you must install a DATAmaestro Add-In, contact Technical Support. If required, you can use Excel to merge data before you upload a file to DATAmaestro.

To load a CSV file:

  1. Click Data Upload file on the menu, and Browse to locate the CSV file on your computer.
  2. Click Upload to copy the file to the DATAmaestro server.

  3. Click Load as dataset to load it to your project. 

    Tip
    titleFiles already uploaded?

    If your file is already uploaded but not loaded to your project, click Data > CSV file, and Browse to locate the file on DATAmaestro server.


  4. Select the Delimiter method that was used to create the file.
  5. If the Variables are defined in the header, select the check box. Uploading this information can be useful when you define the data type.

  6. Select the decimal separator for the number format in your CSV file: a period (.), or a comma (,).
  7. Select a Character encoding standard: utf-8 or ISO 8859-1.
  8. Click Retrieve to view the variable names and their data type.
  9. Review the variables and ensure the Type is correct; either, numerical or symbolic. If required, you can rename variables to give them more meaning.
  10. Click Load

Tools for the Job

Depending on your project objectives, you can use a combination of different tools and methods. Knowing which tool to use will simplify the process. The following table provides some typical inquiries and an entry point for you to start working with your data:

ObjectiveYour QuestionMethod(s)Considerations
Evaluate seasonal production rates.How can I see my production rates over the last five years?TrendsCreate a record set with a rule to define the five-years period. Then create a Trend using the variable that represents the production rate.
Assess the dependence level between the variables in a database.How can I see the relationships for a specific KPI?

Dendrograms


This tool produces a two views:

Dendrogram - a summary of the variable groups that have a high correlation frequency.
Correlation matrix - a table giving all the correlation factor values, one for each pair of variables.

Filter the data based on variables range limits. 

My normal variation range is less than 1100. How can I use this information to filter my data?

Record sets

Create a new record set using the “filter” rule and apply your control values (<1100) to the variable.

 
Change the nature of a subset of data, for example, change numbers into symbols.How can I convert a continuous variable in my database into quality levels? Functional variableCreate a Function variable to transform your numerical KPI into symbolic representations, for example, low, medium and high.
Export images and data. How can I use my analysis in a presentation to my team? I would also like to use the data in a report.

Export to PNG

Reports 

You can export any of the visualizations in PNG file format, as well, you can export subsets of your data for reporting purposes. 
Show a correlation between variables. How can I confirm how two variables are correlated? 

Scatter Plots 

Dendrograms

A scatter plot can show you how two variables are correlated. Correlations may be positive (rising, dots slope from lower left to upper right), negative (falling, dots slope from upper left to lower right) and none (uncorrelated).  

Understand the nature of the records in your a database.  

How can I find out if my database contains records that render a stable process with one production regime (homogeneous records), or transient periods and numerous production regimes (heterogeneous records)? 

PCA

K-Means

Use PCA to identify the variability in your database. Create 2 components and draw them with a Scatter Plot. If the records are homogeneous, the records typically form a compact cloud. If heterogeneous, the cloud is extended or distinct points.

Use K-Means to try and split your database into several groups of homogeneous clusters. If the method gives poor results, it means the records in your database are homogeneous, i.e. difficult to split. Conversely, if the method gives good results, it means your database records can be organized into several homogeneous clusters; confirming that your database is heterogeneous. 

Modeling Techniques

Model learning allows the interactive and iterative use of data mining methods. Some methods produce a model, which expresses the relationships between the input variables and the output variable. The model is added as a new variable which can then be used in turn as an input or output variable in subsequent steps of the data mining process (i.e. hybrid methods).

For more information, see Models.

Learn More

Try the Demo Project available at https://projects.mydatamaestro.com/static/welcome.html. The Demo Project includes developed features for all the visualizations and models available in DATAmaestro. You can explore the information that they reveal and see how each of these features is created. 

 You don't have to be a licensed subscriber to try DATAmaestro; simply sign up for a free trial.