Data processing in PAW
This tutorial shows you how to import data from a CSV file, massage and enrich the data with some reference data kept in an Excel file and some managed manually in PAW, and then finally load it into a database. Click on each step to see the details.
In this scenario, we will be enhancing timesheets containing hours spent for employees with employee titles from one source and title to billing rates from a table to calculate the billing for a client.
> Step 1: Import reference mapping data from Excel file (employee titles)
1. From the left hand panel, under the section "Create new process", click on "Import from Excel"

2. The 'Import Excel file' dialog box will open.

a. File: Start by selecting the file to be imported. Click on the "Browse" button to select a file from your computer and select the file to import and click "OK".
b. Go to the "Next" section in the wizard.

c. Worksheet: Select the worksheet from the Excel file. As a start, the first worksheet is automatically selected and the data in it is displayed.
d. Starting cell: Click on the cell you want as the starting cell from the grid showing the data in your Excel file.
e. Ending cell: Then, click the cell you want as the final column. If you have a table of data from A1 through C5, you can click on C1 as the ending cell in which case PAW will import all data in columns A-C until there is a row with no data. You can also select C5 as the ending cell and PAW will import specifically the data in columns A-C and rows 1-5.
f. Go to the "Next" section in the wizard.

g. Contains header row: If the first row in the Excel table you've selected contains the headers, then check the "Contains header row" option.
h. Required column: If the Excel file contains summary rows, you can specify to skip importing rows where a certain column is blank.
i. Click on "Update Preview" to see what the imported data will look like.
3. Press "Finish and Create New Process" to actually import the data
This will import the data into PAW and will also create a process for bringing the data. In case you get an updated file, you can simply "Run process" to re-import the data from the new file. Note that the only "step" in your process is "Import excel file ..." thus far, as highlighted in the picture below.
4. Click on the "Go back, I'm done with this" option on the left pane so you can bring in more data
This will take you to the list of all the datasets in your file. Note the highlighted process "Employee List.xls - Sheet1" which you just defined.> Step 2: Setup a table for additional lookup data (billing rates by title)
1. From the left hand panel, under the section "Manage data", click on "Create new table"

2. The "Add Table" dialog will open.

a. Now specify the fields you need for your lookup table. Specify the name and type of data for the field and click on "Add Field" to add each field to the list.

b. Provide the name of the table and click on "Create Table >".

5. Your table will be created and go ahead and enter data into the table.

6. Click on the "Go back, I'm done with this" option on the left pane so you can bring in more data
This will take you to the list of all the datasets in your file.> Step 3: Import CSV file and merge with other datasets (timesheet with lookups for titles and billing rates)
1. Click on "Import CSV file" from the menu and the "Import CSV file" dialog will open.

a. Specify the filename (you can "Browse" for the file) and also if it contains a header row.
b. Click "Update Preview" to verify that the imported data will be as expected.

2. Specify the name of the process and press "Finish and Create New Process" to actually import the data
This will import the timesheet data into PAW and will also create a process for bringing the data.
3. To add the lookup field for the title, click on the "Match/merge" option under the "Add Steps" section from the left pane.

4. The "Match/merge data" dialog will open.

a. Specify the dataset to merge with.
b. Then click 'Next'.

b. Specify why we are merging the data. In our case, it is because we're performing a lookup. Select "Integrate data and pick fields from the matching dataset".
c. You will be asked which fields from the matching dataset you want. In our case, want the 'Title' field only.

d. Then click 'Next'.

e. Select which fields match between the timesheet and the employee lookup dataset. Click on the field to match and then select the corresponding field.

5. Click on "Add step to existing process" and then dataset will be refreshed with the additional 'Title' field.

6. Do the same Match/merge process with the billing rates table to get the rates to be used for the timesheet records.
We want to get the rates, matching on the employee's title.
7. You can filter out records by value. Go to a record having "Not billable" and right click and select "Filter" from the menu.
We want to filter out records where the time-entry is marked as "Not billable".
a. Select the field(s) that you want to filter on. Then click on "Exclude matching items".
8. CSV files don't contain data types and so all data is imported as text. For the decimal values, you will need to convert from text to decimal. Click on 'Convert fields' from the 'Add Steps' section on the left pane.

a. Select the 'Hours' field and click on 'Convert' to specify the conversion options. Click on the 'convert fields' option, then specify the type as 'decimal' and finally provide the format to parse the text data. You should click on 'Test Conversion' to validate your settings and the parsing format.

b. After clicking 'OK' on the conversion settings, you should see the following dialog. Go ahead and click on 'Add step to existing process'.

9. Next you want to calculate the billing amount. Click on the "Add formula" under the "Add Steps" section from the left pane.
This will bring up the "Add calculated fields" dialog.
a. Specify the field name 'Billing Amount', type as 'Decimal' and the formula to use. You can view the variables available to perform the calculation by clicking on 'View field variables'.

b. Click on 'Add' to add that field formula, you should see the following dialog. Go ahead and click on 'Add step to existing process'.

10. Summarize the timesheet by project and date for the billing amount. Click on 'Summarize' under the 'Add Steps' section from the left pane.
This will bring up the "Summarize data" dialog.
a. One by one, select the field(s) to summarize on and click on 'Group By'.

b. Similarly, select the field(s) for which data will be summarized and click on 'Summarize by' and you will need to specify the function to apply.

c. You will end up with the following setup.

d. Specify a new dataset name and click on 'Create new process'

11. You now have the dataset in final form and want to load it into a database.
> Step 4: Load dataset into a database (load billing information for further processing)
1. From the left hand panel, under the "Add steps" section, click on "Load data to table/DB"

2. The "Store data to table/database" dialog will open.

a. Specify whether to load to a database or a local table that you've created in PAW. We will select loading to a separate database.
b. Select the database and then the table to load into.

c. Click on 'Next >'.
d. Specify whether you will be updating existing records or loading new records. In this case we will just be loading.

e. Click on 'Next >'.
f. Map the fields from the dataset to the fields on the selected table.
g. Specify a new process name and click on 'Create new process'

h. This will create a new dataset and you can determine which records were inserted and if there were any errors.

