Data processing in PAW
This tutorial shows you how to import data from a CSV file, massage and enrich the data with some reference data kept in an Excel file and some managed manually in PAW, and then finally load it into a database. Click on each step to see the details.
In this scenario, we will be enhancing timesheets containing hours spent for employees with employee titles from one source and title to billing rates from a table to calculate the billing for a client.
> Step 1: Import reference mapping data from Excel file (employee titles)
Be sure to have the Excel file saved to your local machine.
1. From the left hand panel, under the section "Create new process", click on "Import from Excel"

2. Select the Excel file to import.

3. You will be the "Import Excel File" dialog.

a. Worksheet: Select the worksheet from the Excel file. As a start, the first worksheet is automatically selected and the data in it is displayed.
b. Starting cell: Click on the cell you want as the starting cell from the grid showing the data in your Excel file.
c. Ending cell: Press the SHIFT key and click the cell you want as the final column. If you have a table of data from A1 through C5, you can click on C1 as the ending cell in which case PAW will import all data in columns A-C until there is a row with no data. You can also select C5 as the ending cell and PAW will import specifically the data in columns A-C and rows 1-5.
d. Treat first row as header: If the first row in the Excel table you've selected contains the headers, then check the "Contains header row" option.
e. Click on 'Import Preview' to view the result from your inputs. Press 'Refresh data' to see the data.

4. Press "Import" to actually import the data
This will import the data into PAW and will also create a process for bringing the data. In case you get an updated file, you can simply "Refresh" to re-import the data from the new file. Note that the only "step" in your process is "Import excel file ..." thus far, as highlighted in the picture below.

5. Since you will be using the Employee ID and PAW imports all numbers from Excel as decimals, we will convert the decimal for Employee ID to a number. Click on 'Change field types' under 'Convert'.

a. Click 'conversion options' on the 'ID' field as follows:

b. Specify to 'Convert to' Number to convert the decimal into a number.

c. Click 'Convert' to complete this step.
> Step 2: Setup a table for additional lookup data (billing rates by title)
1. From the left hand panel, under the section "Manage data", click on "Create new table"

2. The "Add Table" dialog will open.

a. Table Name: Provide the name of the table
b. Add fields: Now specify the fields you need for your lookup table. Click on '<add field>' and then specify the name and type of data for each field.

3. Click on 'Create Table' and then enter data into the table.

> Step 3: Import CSV file and merge with other datasets (timesheet with lookups for titles and billing rates)
Once again, ensure that you have the CSV file saved to your local machine (or on a network drive).
1. Click on "Import CSV file" from the "Import" menu and select the file to import.

a. Confirm if the file contains a header row.
b. Verify the data will to be imported as expected.

2. Set the name of the dataset and press "Import" to actually import the data
This will import the timesheet data into PAW and will also create a process for bringing the data.

3. Importing from CSV brings in data as strings. We want to convert EmpID to a number, Hours to a decimal, and Week end date to a date. Click on 'Convert Field Type' under the 'Convert' menu and specify the conversion as follows:

4. To add the lookup field for the title, click on the "Integrate" option under the "Consolidate" menu.
The "Integrate data" dialog will open.

a. Integrate with dataset: Select the dataset to pull the lookup field from.
b. Match conditions: Click '<specify how to match records>' and then pick the matching fields from the timesheet and the employee list
c. Get fields: You will be asked which fields from the matching dataset you want. In our case, want the 'Title' field only.

5. Click on "Integrate" and then dataset will be refreshed with the additional 'Title' field.

6. Do the same "Integrate data" process with the billing rates table to get the rates to be used for the timesheet records.
We want to get the rates, matching on the employee's title.

7. You can filter out records by value. Go to a record having "Not billable" and select "Quick Filter Row" from the "Refine" menu.
We want to filter out records where the time-entry is marked as "Not billable".

a. Select the field(s) that you want to filter on. Then click on "Exclude matching items".
8. Next you want to calculate the billing amount. Click on the "Add formula" under the "Refine" menu.
This will bring up the "Add calculated fields" dialog.

a. Click on '<add calculated field>' and then specify the field name 'Billing Amount', type as 'Decimal' and the formula to use. You can view the variables available to perform the calculation by clicking on the 'Existing Fields' tab.

b. Click on 'Calculate' to add the calculation.

9. Summarize the timesheet by project and date for the billing amount. Click on 'Summarize' under the 'Refine' menu.
This will bring up the "Summarize data" dialog.

a. Group by fields: Select the fields to summarize the data on. Here, we want to summarize by project and week ending date.

b. Summarize by fields: Select which data you want to summarize, which in this case will be the billing amount. You will also need to select the function for the summary - "SUM".

c. Click on 'Save as new dataset' and specify a new dataset name. You now have the dataset in final form to load to a database.

> Step 4: Load dataset into a database (load billing information for further processing)
You should currently be viewing the dataset that you want to load into the database. For this example, we will assume that we have already setup a database connection in PAW.
1. Under the "Export" menu, click on "Load data to table/DB"

2. The "Store data to table/database" dialog will open.

a. Specify whether to load to a database or a local table that you've created in PAW. We will select loading to a separate database.
b. Select the database and then the table to load into.

c. Specify whether you will be updating existing records or loading new records. In this case we will just be loading.
d. Map the fields from the dataset to the fields on the selected table.

g. Click on 'Save as new dataset' and specify a name. Finally click on 'Save' and this will perform the data load. You can determine which records were inserted and if there were any errors.

In this sample, you've seen how to do the following: import from Excel, import from CSV, convert fields, filter records, summarize data, create tables and use them, merge data, and finally load data into a database. In doing this, you've created an automated process that can be re-run as is, or additional processing steps can be added with minimal effort.