Architecture

The server is implemented in java and is remotely accessed using rmi protocol. The server is accessed using a console written in java. The console uses python - interpreted easy language. The server can also be accessed using a plugin for eclipse.

Server serialization

The server when running contains information about registered objects, mappings and of course other information. These information must not be lost when server is shut down or crashed and therefore server state is synchronized into a sql server. It is not necessary to use some external sql server because there exist java sql internal servers. DwWorkflow uses smallsql java sql server but for real applications is recommended to use some of more reliable sql servers.

Server

The next figure shows the internal components.
The main component is the workspace component. This component at the beginning loads registered mappings and objects with their states into the memory. And then listen to the commands received from the server component.

Objects

The main idea is that we have objects like table or files and these objects have states. The state of an object changes at time periods. Usually period is a day and during this one day object can be processed with more processes. Therefore state is composed of time - it's n number and of state during period - it's s. The n number goes only forward and means first day, second day - or can mean also something different than a day. The s number means state of a object during a period and it cycles within each period. The number 10000 means that object is finally successfully processed.

The server don't know about the real objects like tables or files. These objects only get their counterparts in the server as objects with a name and with a state.

Mappings

The mappings are means how to shift the object's state forward. E.g. we have a table which is inserted each day with data from primary system. Then we need some procedure or program which inserts data from primary system into the table. This procedure actualy changes the state of the inserted table and this state goes throw states:
  1. the object contains data from n - 1. day - state (n-1, OK)
  2. the procedure is started - state of the object is "started being processed at n. day" - state (n, STARTED)
  3. the procedure successfully finished - the object contains data from n. day(n, OK)
The flow through states is simplified - it is a little bit more difficult in the implementation.

The mapping in the server is an object which has a name and knows its target object and knows how to run the external procedure which actually makes the required work.

Dependencies

The procedures take usually data from some sources and these sources must be ready for the given period. That is mappings depend on the objects. If the datawarehouse is processed at time n then mapping can depend either on object state at n - 1 or at n. The dependency at n - 1 is used for the start mapping which can run after the previous run of the datawarehouse is finished.

Running of mappings

Mappings don't run whenever the dependencies are satisfied because it could lead to uncontrolled runs of mappings whenever some developer would do some error. It is needed to check what mappings will run especially when there are being done changes in the workflow. So the mappings must be planned but the user don't plan all mappings but only requires one object to shift to some state. This shift of object will plan one mapping but this will recursively plan another mappings until all needed mappings are planned.

Still after planning the mappings need not run because the server can be at stopped state. It means the server provides commands set by a user but don't run planned mappings.

Failing or stopping of mappings

It is a common situation when the mappings fail. Then the following mappings can not run. It is necessary to remove the cause of failing and then to tell server that it can try to run the mapping again.

It is also possible to tell server that some object can be shifted only up to some state. This potentionally stops running of some mappings. It is often used for developing purposes or when installing new features.

Reporting

The server provides functions for watching of the server state. User can also create his/her own reports which are based on the tables in sql server.

Console and python

The console uses jline library so you can use history of commands. It is not necessary to know python for console using but the python offers much more flexibility.
 
SourceForge.net Logo