Processes and Steps
One of the core concepts in PAW is the concept of a process consisting of processing steps. PAW creates the process by capturing the processing steps as the user manipulates the data for their needs. This process can then be re-run in an automated fashion.
What is a process?
A process results in a dataset. Each time the process is run, the resulting dataset is overriden with the new results. Often, processes are chained together, so that one process will use the results of another process. This is often the case when the same dataset needs to be processed in multiple ways.
Timesheet Summary is a process, consisting of the
following steps: import data from an Excel file, merge it, filter by project,
and then summarize it.
Iterative and Data Centric
As a user creates the process, the resulting dataset is displayed immediately
with each step. Hence, as you import data from somehere (ex. database), the
result of the query is shown. Then, as you join it with another dataset,
the joined result is shown. In this manner, with each step, you see exactly
how your data will look. You end up focusing on the data rather than the
process, allowing you to stay with the business context.
Types of Steps
There are 3 types of steps that are part of a process.- Data sourcing step - each process starts with one step
that sources the data from somewhere. This can either be an external
data source like a website or an internal source such as another
dataset.
- Data processing step - there can be one or many processing steps
as part of a process. These are like filtering, summarizing, joining,
calculating fields, etc. The data is passed through a chain
of these steps with each step adding some data or changing some data
to the existing sourced data.
- Data output step - there can be an optional data output
step for a process that sends the data out to a file. A lot of times,
data output can be treated as a data processing step, with an example
being loading data into a database.
