PAW - Process & Analytics Workbench

Transform The Way You Work With Data. Massage - Integrate - Automate

14-Day Free Trial
Buy Now

How to merge data in PAW

PAW allows you to merge data from different data sources, including Oracle, SQL Server and MySQL databases, Excel and CSV files, text files, web pages, and XML files. The steps to merge any 2 types of data are provided below; in this case, we will merge data from an Excel file and a Oracle database. Not only will you merge the data, but you will have built an automated process for this, so if you get an updated file or need to run it later against the database which changes, you can simply run that process with minimal effort.

Step 1: Import data from an Excel File

Be sure to have the Excel file saved to your local machine.

1. From the menu, under the section "Import", click on "Import from Excel"


2. Select the Excel file to import.


3. You will be the "Import Excel File" dialog.

a. Worksheet: Select the worksheet from the Excel file. As a start, the first worksheet is automatically selected and the data in it is displayed.

b. Starting cell: Click on the cell you want as the starting cell from the grid showing the data in your Excel file.

c. Ending cell: Press the SHIFT key and click the cell you want as the final column. If you have a table of data from A1 through C5, you can click on C1 as the ending cell in which case PAW will import all data in columns A-C until there is a row with no data. You can also select C5 as the ending cell and PAW will import specifically the data in columns A-C and rows 1-5.

d. Treat first row as header: If the first row in the Excel table you've selected contains the headers, then check the "Contains header row" option.

e. Click on 'Import Preview' to view the result from your inputs. Press 'Refresh data' to see the data.


4. Press "Import" to actually import the data

This will import the data into PAW and will also create a process for bringing the data. In case you get an updated file, you can simply "Refresh" to re-import the data from the new file. Note that the only "step" in your process is "Import excel file ..." thus far, as highlighted in the picture below.

Step 2: Import data from a MySQL database

You are required to have the userid and password information for the database you're going to be accessing. You will also need to know the SQL to bring the data in.

1. Click "Database Query" under the "Import" menu to start importing data from a database.

The 'Database query' dialog as shown below will open.

2. Click on "Add database".

Once the connection is setup, you will be able to re-use the connection for additional database queries.

3. Select '<add database type>' from the Database Type option.

In specifying the information about database types, you will provide details on how PAW should connect to the database.

a. Database Type: Provide a name for the database type (ex. MySQL)

b. Driver Class: Specify the code that connects to the database. Sample configurations are provided for the primary databases. You can contact us at CustomerService@pawanalytics.com for help if necessary.

c. Code Libraries: Select the file path to the code library that is provided by the database vendor to connect to the database. You should get this from your IT support person and keep it on your local computer. You can also download these from the Internet by going to the database vendor's website.

d. Click on 'Ok' once you've specified the above information to setup the database type.

4. Now fill in the details for the server, username, and password for your database.

a. Name: Name your database connection

b. Database type: Database vendor (ex. MySQL, ORACLE, etc..)
If you don't see your database listed, you should contact us at CustomerService@pawanalytics.com for a patch to get your specific database setup.

c. Database URL: This is a specific formatted string containing details about the database that you need to connect to.
We have provided examples for MySQL, ORACLE, and SQL Server or MS Access (using ODBC).

i. For ORACLE, you can replace 'MyOracleHost' with the server running your database and 'MyDB' with the name of the database. The '1521' is the default port but may need to be changed if your database has been setup differently.
ii. For MySQL, the format is slightly different but still contains the server running the database, the port (default is 3306), and the name of the database.
iii. For Microsoft product databases (SQL Server and Access), you would have created a DataSource using the standard Windows tools. You can simply refer to the DataSource name as indicated in the example.

d. User: Provide the user id you use to login to the database directly. Your database administrator may need to create one for you and provide you the userid and password.

e. Password: The password you use to login to the database. This would be provided to you by your database administrator.

f. Test Connection: You should test to see if the parameters provided are correct and you are able to connect to your database as expected.

g. Press "OK" to create the database connection.

5. Enter the SQL for the database query and click on "Refresh data"

See the results on the "Import Preview" tab.

6. Specify a process name (ex. "Customers Query") and press "Import" to actually import the data from the database

This will import the data into PAW and will also create a process for bringing the data. Once again, re-running the SQL and bringing the data is as easy as clicking the "Refresh".

Step 3: Merge datasets

We will now match the 2 datasets based on matching columns and do a lookup of one of the fields from the database.

1. Go back to the data imported from Excel and click on "Integrate" under the "Consolidate" menu.

You can simply click on the tab for that dataset.

2. The "Integrate data" dialog as follows will be shown:

As you're getting used to the PAW user interface, you need to specify the parameters for how the data should be merged.

a. Integrate with dataset: Specify the dataset that you want to merge the leads with. In this case, we'll select 'Customers Query' which we pulled from the database.

b. Specify how to match fields: Click on '<specify how to match fields>' to add matching field options. Select the fields from the 2 datasets that match. You can add multiple match conditions.

d. Get fields: Specify the field(s) that you want to add to your dataset as part of the lookup.

3. Press "Integrate" to perform the matching

This will perform the matching - the 'Segment' column has been added to the result. Note that this step has been added to the process that imported the data from the Excel file. Hence, when you need to reload data, the matching will be performed automatically as part of a single process. As an option, you can create a separate matching process.

Step 4: You're done!

You have seen how to do the following: import data from an Excel file, setup a database connection and get data from a SQL query, and finally merge the datasets. In doing this, you've created an automated process that can be re-run as is, or additional processing steps can be added with minimal effort.