Authors: Rob van Maris (Finalist IT Group)
Last revision: Januari 18th, 2002
The most important new concept introduced by XML Importer is merging objects. The XML Importer code will handle most of the details for you, and in order to put this to work, all you will have to do is provide implementations for these interfaces:
The XML Importer provides basic implementations for both of these, but some additional work will be necessary to meet your needs.
In this document we'll have a look at some of the issues involved, and give some guidelines.
When we have populated a transaction with (access and input) objects, we can merge all objects of a given type. In order to do so, the XML Importer performs these actions:
The SimilarObjectFinder is needed to implement step 2. One fairly general way to do this is implemented by BasicFinder, which makes a distinction between exact matches (i.e. indistinguishable) and non-exact matches (e.g. different, but considered to be the same, based on some specified criteria - i.e. fuzzy comparison):
- Walk through the list of objects in the transaction - in the order they were added to the transaction - of the given type.
- For each such object, look for a similar object.
- If a similar object is found, merge both objects to a single object.
- If more than one similar object is found, the transaction cannot proceed, unless the user can choose the object to merge with.
This strategy has these characteristics:
- Walk through the list of objects in the transaction, that were added before this one, compare these with this object, and keep the results as a list of exact matches and a list of non-exact matches.
- Look for exact matches in the persistent cloud
- If exact matches were found in step 1 or 2, these are returned as the result of the search.
- Otherwise, look for objects in the persistent cloud that are close enough to warrant further inspection, and compare these with this object.
- If non-exact matches were found in step 1 or 4, these are returned as the result of the search.
For examples of implementation based on BasicFinder, see MoviesFinder and PersonsFinder in the XML Importer examples code.
- If the transaction introduces a number of similar objects, these are merged one by one, in the order they were added to the transaction.
- If an exact match is found, the non-exact matches are ignored.
- Searching the persistent cloud for non-exact matches occurs only if no exact match is found (performance optimimalization).
- Searching the persistent cloud for non-exact matches is performed in two parts: selecting objects that are close enough, followed by comparing these objects with this object (performance optimalization, since this reduces the number of objects to be compared).
The ObjectMerger is needed to implement merging two objects to a single object. In order to do so, the XML Importer performs these actions:
A fairly general implementation is provided by BasicMerger, which has these characteristics:
- If one of the objects represents a persistent object, this object is made the merge target, e.g. the object that will hold the merge result.
- The fields of both objects are merged - the resulting fields are set on the merge target.
- The relations of both objects are merged - the resulting relations are set to the merge target.
- If step 3 results in duplicate relations, the duplicating relations are deleted.
- Of the two objects, only the merge target is retained - the object that is not the merge target is deleted.
- If there was no similar object to merge with, this object will only be kept in the transaction if the ObjectMerger specifies so (see method
isAllowedToAdd() in ObjectMerger).
- The fields of the merge target are unaffected (e.g. the merge result has the same fields of the merge target).
- The relations of both objects are moved to the merge result.
- Relations are considered duplicates when of same type and with same source and destination (e.g. in this case the duplicating relations are deleted).
- Objects for which no similar object is found, are kept in the transaction
- As a general rule, keep the transactions small.
- If a number of transactions involve merging with the same objects over and over, it is worthwhile to combine these into a single transaction.
- Within a transaction, the same object in the persistent cloud can be accessed repeatedly (e.g. <accessObject mmbaseId="12345" id="id12345">), provided the same id is used each time. This proves handy when the XML file containing the TCP code is generated by a stylesheet transformation, where it can be hard to establish if an object had been accessed within the same transaction already. Just use the mmbaseId to create a unique id, so the same id will be used when accessing the same object again.
- When it comes to performance and resources, merging objects can be expensive, therefore try to use the merging mechanism only when really needed. As an alternative, in many cases objects can be be accessed directly using their mmbase id.
- If you want to see wich objects are merged, set logging priority for the class
Transaction to "debug", and look at the log output of the Transaction method
- Set the timeOut to a value sufficient for the transaction to be completed under normal circumstances. Keep in mind that transactions take longer to complete in interactive mode.
Merging objects can put a heavy stress on the MMBase server and database, so it is important to be aware of the following perfomance issues.
- When two objects are merged, all their relations are added to the transaction in order to be processed. Try to avoid merging objects that have a large number of relations.
- When using SimilarObjectFinder to merge objects, based on a fuzzy match criterium, the search for non-exact matches in the persistent cloud occurs in two stages:
Step 2 is needed when a database query is not sufficient to compare the objects. Therfore try to make the query in step 1 as selective as possible to avoid large numbers of unneeded objects in the transaction.
- Query the database for objects that are close enough and access these objects in the transaction.
- Compare these with the original object.
For a full unterstanding of the XML Importer, it is recommended to read the following documents, available on the MMBase website:
- TCP 1.0 documentation (see Temporary Cloud Project).
- XML Importer overview (see XML Importer Project).
- The javadoc documentation of the