PAW - Process & Analytics Workbench

How to merge data in PAW

PAW allows you to merge data from different data sources, including Oracle, SQL Server and MySQL databases, Excel and CSV files, text files, web pages, and XML files. The steps to merge any 2 types of data are provided below; in this case, we will merge data from an Excel file and a Oracle database. Not only will you merge the data, but you will have built an automated process for this, so if you get an updated file or need to run it later against the database which changes, you can simply run that process with minimal effort.

Step 1: Import data from an Excel File

Be sure to have the Excel file saved to your local machine.

1. From the left hand panel, under the section "Create new process", click on "Import from Excel"


2. The 'Import from Excel' dialog box will open.

a. File: Start by selecting the file to be imported. Click on the "Browse" button to select a file from your computer.

b. Click on 'Next' to proceed.

c. Worksheet: Select the worksheet from the Excel file. As a start, the first worksheet is automatically selected and the data in it is displayed.

d. Starting cell: Click on the cell you want as the starting cell from the grid showing the data in your Excel file.

e. Ending cell: Press the SHIFT key and click the cell you want as the final column. If you have a table of data from A1 through C5, you can click on C1 as the ending cell in which case PAW will import all data in columns A-C until there is a row with no data. You can also select C5 as the ending cell and PAW will import specifically the data in columns A-C and rows 1-5.

f. Click on 'Next' to proceed.

g. Contains header row: If the first row in the Excel table you've selected contains the headers, then check the "Contains header row" option.

h. Convert date values from GMT time: You should not need to change this option unless you're getting issues with bringing in the correct dates. Dates entered in Excel are assumed to be in the local timezone so if you've importing a file where the data entry was done across the globe, this may be of use.

i. Required column: If the data you're importing has sub-totals which you may not want to import, you can specify a column which has not been sub-totaled, and any rows that don't contain data in the specified column will not be imported.

3. Press "Finish and Create Process" to actually import the data

This will import the data into PAW and will also create a process for bringing the data. In case you get an updated file, you can simply "Run process" to re-import the data from the new file. Note that the only "step" in your process is "Import excel file ..." thus far, as highlighted in the picture below.

4. Clicking on the "Go back, I'm done with this" option on the left pane so you can bring in more data

This will take you to the list of all the datasets in your file. Note the highlighted process "Leads2.xls - Leads-CM203" which you just defined.

Step 2: Import data from a MySQL database

You are required to have the userid and password information for the database you're going to be accessing. You will also need to know the SQL to bring the data in.

1. Click "Database query" to start importing data from a database.

The 'Database query' dialog as shown below will open.

2. Click on "Specify new database connection..".

Once the connection is setup, you will be able to re-use the connection for additional database queries.

a. Name: Name your database connection

a. Database type: Database vendor (ex. MySQL, ORACLE, etc..)
If you don't see your database listed, you should contact us at CustomerService@pawanalytics.com for a patch to get your specific database setup.

c. Database URL: This is a specific formatted string containing details about the database that you need to connect to.
We have provided examples for MySQL, ORACLE, and SQL Server or MS Access (using ODBC).

i. For ORACLE, you can replace 'MyOracleHost' with the server running your database and 'MyDB' with the name of the database. The '1521' is the default port but may need to be changed if your database has been setup differently.
ii. For MySQL, the format is slightly different but still contains the server running the database, the port (default is 3306), and the name of the database.
iii. For Microsoft product databases (SQL Server and Access), you would have created a DataSource using the standard Windows tools. You can simply refer to the DataSource name as indicated in the example.

d. User: Provide the user id you use to login to the database directly. Your database administrator may need to create one for you and provide you the userid and password.

e. Password: The password you use to login to the database. This would be provided to you by your database administrator.

f. Test Connection: You can test to see if the parameters provided are correct and you are able to connect to your database as expected.

3. Press "OK" to create the database connection

You will be taken back to the Database query dialog with the database you just created as selected. Click 'Next' to proceed.

4. Enter the SQL for the database query and click on "Preview Results"

5. Specify a process name (ex. "Customer Query") and press "Finish and Create new process" to actually import the data from the database

This will import the data into PAW and will also create a process for bringing the data. Once again, re-running the SQL and bringing the data is as easy as clicking the "Run process".

Step 3: Merge datasets

We will now match the 2 datasets based on matching columns and do a lookup of one of the fields from the database.

1. Go back to the data imported from Excel and click on "Match/merge" under the "Add Steps" section.

You can simply click on the tab for that dataset.

2. The "Match/merge" dialog as follows will be shown:

As you're getting used to the PAW user interface, you need to specify the parameters for how the data should be merged.

a. Merge with dataset: Specify the dataset that you want to merge the leads with. In this case, we'll select 'Customer Query' which we pulled from the database.

b. Click on 'Next'

b. Specify the purpose of the merge: In this case, we want to lookup data from the customer table so we select to "Integrate data and pick fields"

c. Selecting lookup fields: Specify the field(s) that you want to add to your dataset as part of the lookup.

d. Click on 'Next'

e. Matching Criteria: Specify how data from the 2 datasets can be matched. In case, we specify that the 'Email' field from the Excel dataset should be matched against the 'EmailAddress' field from the database.

3. Press "Add step to existing process" to perform the matching

This will perform the matching - the 'Segment' column has been added to the result. Note that this step has been added to the process that imported the data from the Excel file. Hence, when you need to reload data, the matching will be performed automatically as part of a single process. As an option, you can create a separate matching process.

Step 4: You're done!

You have seen how to do the following: import data from an Excel file, setup a database connection and get data from a SQL query, and finally merge the datasets. In doing this, you've created an automated process that can be re-run as is, or additional processing steps can be added with minimal effort.