Records and variables are at the core of data management, this section covers the definition and selection of records and/or variables and their organization in subsets. Data BasicsThe rows and columns identified in the data source you upload provide the building blocks for your project. The columns in the file become variables, for example, Date, Gas Production, Valve1, Valve2, or Temperature. All the variables in the data source are available to use individually, but not all of them are relevant for every analysis. To maximize some features in DATAmaestro, you can create Variables sets. A Variables set is simply a set of columns (variables) that you select from all of the available variables in the data source. Each row in the data source is considered a record. A single data source can have hundreds, or even millions of records. To use the data effectively, operators are used to define rules, for example, a time frame, values greater than X, or not equal to Y. Each set of rules you create and name is called a Record set. One project can have many Record sets to target different types of data analysis. In DATAmaestro you can also write your own formula expressions to define a Function Variable. For example, the Function editor lets you: transform an output variable type from numerical to symbolic, or inversely compute the ratio function of two explicit variables, such as pressure and temperature calculate an average for similar variables, such as three temperatures measured on the same equipment.
About Data SourcesAfter you upload a file to the server and load it, the data source is dynamically linked to your project. Every time a data source project item is updated, all the dependent building blocks are updated automatically, for example, sets, graphics, and models. Info |
---|
title | Data File Protection |
---|
| Your data source file is never altered by the modifications you make to your DATAmaestro project. For example, if you change the name of a variable when you load the CSV file, it is saved as project information and does not change the data source file itself. |
All the data sources you upload and project information you create is secure and saved in a database. Only authorized users have access. Info |
---|
title | Multiple Data Sources |
---|
| A project can have one data source file, or many data sources. However, data from different data sources cannot be shared or merged for analysis inside DATAmaestro Analytics. If you want to combine data from two different sources, consider using DATAmaestro Lake to merge the data and then uploading the new file for your project. |
When you create a new project, you will reach a new page requesting you to select a datasource. There are two main ways to upload data to DATAmaestro and one way to select existing data. Quick-start upload: If you have one CSV file ready for analysis, upload directly to DATAmaestro Analytics and start working. It is the same method as for upload data in the DMLake. Upload to Lake: If you have multiple CSV files to merge or one CSV file to resample, upload to DATAmaestro Lake. Select from Lake: If you have already uploaded data to DATAmaestro Lake, select it here.
Quick-start Upload If you have one CSV file ready for analysis, upload directly to DATAmaestro Analytics and start working. It is the same method as for upload data in the DMLake. Quick Upload [1 / 2] : Select your data Click on Choose File and select the .csv file from your disk. While uploading data from CSV, the software automatically detects the file format (Column Delimiter and Number Format) and provides a preview of the file content and how it is read by the system. Check the Column Delimiter and Number format to make sure they are the same as the uploaded file. Once a file has been selected, the CSV Preview attempts to interpret your data (Date columns, numerical or symbolic variables), a different color is assigned to the column depending on the data type. Hover over the “eye” to display the original value below. Check the number of header rows. If there are 3 headers then click on + Add Header Row to add Row 2 and Row 3. It is possible to define headers as name, title, unit, description, classifier or skip a row (used when you don't want to upload a row). We can will define Row 1 as Name, Row 2 as Title and Row 3 as Unit. Info |
---|
| Classifiers are metadata saved to a data file (DMFF or CSV). It classifies variables according to different categories. Categories include type (symbolic/numerical), Parameter, Location, etc. For example, the variable “Steel plate thickness” can be classified as a Parameter defined as a “Dimension” and the variable “Chemical type” can be classified as a Parameter defined as a “Chemical”. For more information please check Classify Variables. |
Check if the characters are displaying correctly. If there is an odd symbol, click on Characters not displaying correctly? and try each Charset until the character displays correctly. The options are: UTF-8, ISO-8859-1, windows -1252 and Mac Roman, if you need more information about the different charsets, please check charset. If you want to search for a variable, enter its name in Filter variables field. Click Next.
Quick Upload [2 / 2] : Define and verify variables as time, text or numbers The system should detect columns as Time, Numerical or Symbolic variables. Check that columns have been correctly identified. For each columns you can change the headers and variable type by clicking on it. Click on the first column to see and edit the additional information regarding this column (Name, Title, Type, Time Format, Time Zone). If the information is not correctly detected, please manually correct it. Click Upload to, the file will be uploaded to the DM project. You will see that the file name is a DMFF file.
Upload to Lake If you have multiple CSV files to merge or one CSV file to resample, upload to DATAmaestro Lake. It is equivalent to upload data directly in DATAmaestro Lake, , for more information please check DATAmaestro Lake - Upload. Select from Lake If you have already uploaded data to DATAmaestro Lake, select it here. It is equivalent to export data out of the DATAmaestro Lake, for more information about export from Lake, please check DATAmaestro Lake - Export. - Click on Data and then Select from Lake.
- In Properties tab, Select From and To dates. Make sure the date corresponds to the period of the CSV file.
- Check the Interval box. That correspond how often the data is being collected. E.g., 1min, 30 min, 1h.
- Search and select in a folder the tags to be visualized. Data from different directories can be merged into a single export file. To help select tags, there is a text search and filter option above the tag list (Name, Title and Unit).
- Click on the small arrow to the right of the text search area to see the total number of tags, the number of tags selected, and the number of tags filtered.
- Select the Method to be used, for more information about the Methods please check DATAmaestro Lake - Export. It is also possible to select different methods for one same export file. You can change the method and then select the variable(s). The selected method is displayed beside the variable. You will find more information about this topic in the next slides.
- Select the tags to be exported. Once your tags are selected, click on the white arrow to move the tags to the column on the right. There your can verify all tags ready to be exported.
- To remove tags from the selected tags list, click on the tag(s) you would like to remove and then click the left arrow.
- In Var. naming, select the way the tag name should be exported, the default option is Tag Name Only but, it is also possible to export them adding the Subpath (sub/tag) or the Full Path(/project/sub/tag).
- In File Name, by default, a name will be included. You may change this name.
- Then, click Save.
Info |
---|
| To select different methods for different variables, first select the method, then select the variables that should use that method and move them across to the right-hand side. Change the method and repeat the variable selection process. Alternatively, the method can be changed for each tag individually. From the list of selected tags. click on the method name beside the variable and you will see a list with all methods. Select the method from the list and then you will see that the selected tag’s method has changed. |
- A window appears confirming the size of the data file and it asks if you want to proceed. Click Extract.
- When tags are finally extracted, click on Load data in. Your data is in DATAmaestro Analytics!
- You will be directed to Classify. For more information on how to classify a parameter, please check DATAmaestro Analytics – Classify Variables.
- Optional: If you click on Parameters tab, you can click on Retrieve and then check all variables types, titles and units. You can also edit variable names, title and units here.
- Click Save.
Info |
---|
title | Export data from Lake |
---|
| You can export data to a DATAmaestro Analytics project directly from DATAmaestro Lake. However, the main advantage in selecting data from Lake in DATAmaestro Analytics is that it will be saved directly in your current project. |
Info |
---|
title | Edit data in Analytics |
---|
| Now that you have a data source in your project, you will learn how to edit your data extraction. Edit your data extraction to: add/remove new tags, update time period, change sampling, etc. |
- Your data extraction is stored in your Data Sources area. If you need to edit your selection, time period or add new tags to your Project, on the left bar, click on the Edit icon beside Data Extraction.
- In the selection area, you will find the Lake folder and subfolders. You can navigate to different folders to select variables. Select the new tags to be exported to the data source. Once you have selected your tags, click on the arrow to move the tags to the column on the right. There your can verify if all new tags are ready to be exported. Remember that if the new tags need to be exported using a different method, first select the method and then the tags.
- Click Save.
- When tags are extracted, you will be redirected to this page. Here there are two options:
- To replace the existing datasource (all tasks will be automatically updated): Click on Load data in without changing the selected file in the drop down (Recommended).
- To add a new datasource to your project, select in the drop down New datasource. It is possible to work with multiple datasources in a project, however each task can only use one datasource (no merging provided).
To replace the existing datasource: - Replacing will automatically update all tasks within the project that use that datasource.
- Choose the datasource name. NB: If file name already exists, the file will be overwritten.
- Confirm the steps by clicking ”Extract and replace data in existing datasource”.
Info |
---|
| Note on DMFF file the will replace an existant DMFF both with the same name. This means that if the same file is used in two different Analytics projects (for example, if you create a copy of the project) and if the DMFF is updated in only one project, but both projects have the same file name, the DMFF file will be automatically updated in the other project too. Therefore, this new DMFF will overwrite the existing one. If you want to avoid overwriting, upload the new DMFF with a different name or change the name when extracting from the Lake. |
Look into your dataset - To open you file (table), click on Data icon on the left vertical bar.
- Check the Data tab, where the sampled data is found. Scroll to the right to check all variables.
- Check the Summary tab, where basic statistics can be found at a glance. Scroll down to check all variables.
Filter the values in Summary tab and create a new variable set - To open the drop-down Filter option, click on the Filter icon in the column header. E.g.: find the column “Number of missing” (values) and click on the Filter icon. This way you will create a new variable set containing variables with the number of missing values equals to zero which means, tags that have missing values > 0 are not considered in this set. A drop-down menu appears with a Text Filter.
- In the Text Filter box select among the different options. E.g.: “equals” and value “0”. This way you are filtering all values that have missing values equals to zero.
- Click OK.
Image Removed - Click the checkbox column to select all tags that have missing values equals to 0. You can select one by one but to select all of them click on the checkbox header.
- Click on More Actions. You can select Variable set in Variable Selection. Name this variable set, “Selection of Tags with No Missing values”.
Image Removed
Info |
---|
title | Check data integrity |
---|
| The first check you can do is on the data source Summary Tab, where you can see basic statistics for all variables (average, min, max, st dev, nb missing and nb values) and quickly filter and sort variables. Typically, we do a quick check to remove variables with a high % of missing values and with constant standard deviation. |
- To open the drop-down Filter option, click on the Filter icon in the column header. E.g.: find the column “Number of missing” (values) and click on the Filter icon. This way you will create a new variable set containing variables with the number of missing values equals to zero which means, tags that have missing values > 0 are not considered in this set. A drop-down menu appears with a Text Filter.
- In the Text Filter box select among the different options. E.g.: “equals” and value “0”. This way you are filtering all values that have missing values equals to zero.
- Click OK.
Image Added
- Click the checkbox column to select all tags that have missing values equals to 0. You can select one by one but to select all of them click on the checkbox header.
- Click on More Actions. You can select Variable set in Variable Selection. Name this variable set, “Selection of Tags with No Missing values”.
Image Added
Tip |
---|
| For example, one thing I do to check data, is to plot one parameter on a histogram or a trend. A second good check is to plot key variables like production rate or key pressures, temperatures, speeds, to look for problematic periods or outliers. |
Info |
---|
| An underlined class of a symbolic variable indicates a white space (at the beginning, middle or end of the symbol). When creating Record sets based on symbolic variables with spaces, it is recommended to copy them directly from the Data Summary in order to capture all spaces. Image Modified |
|