Maximo - Inserting Records by Sequence Number While Merging dB

From ExxtremeWiki
Jump to: navigation, search

Merging Maximo Databases[edit]

Merging Maximo databases carries it's own problems, especially if you have thousands of records and you want to avoid problems with Maximo sequences. While this isn't much of a problem on Oracle systems where the "sequence" is actually a function of the Oracle database and doesn't appear to have an upper limit it can be a real pain in SQL Server.
SQL Server doesn't have a built in sequence function (at least not on 2000 - 2008 that I know of). To work around this PSDI (MRO Software/IBM) created the "MaxSequence" table and recorded the current sequence there for each table where it was used.
You can merge the data between databases while one of them is still live in "Production" if your careful about a couple of rules.
  1. Each member of the cluster will grab 100 records into memory and update the appropriate value in the MaxSequence table. As long as you don't step on those 100 records (and we gave ourselves the benefit of the doubt by blocking out the following 100 records as well; just in case Maximo had already used 99 of them!) you can insert records into the dB on the fly.
  2. You have to update the MaxSequence anytime you go past the 100 reserved numbers quickly before Maximo uses them and you run into a index violation error.
We performed a copy into a temporary table, then updated a "sequenceID_new" field in the temporary table, once that was done we compared the current value with the maximum new ID value and updated the MaxSequence table if necessary at that point. Then with all our sequences already figured out and the space allocated, we copied the records into the target table.
Initially we used a counter and a loop to cycle through all the possible numbers looking for a gap in the sequence to insert the record into. Once that gap was found and the ID field updated we did the next record, running through all the records and then updating the sequence table. This proved to be to slow if the process exceeded the available slots and we had to update the MaxSequence number. We'd end up having to restart the production system multiple times during the loads (for each table that exceeded the max sequence number as each load finished.)
The solution was to use a construct unique to SQL Server (I'm sure there is something like it in the other databases, but in SQL Server it's a "Common Table Expression" abbreviated here as "CTE" or specifically "myCTE" and "myCTE2" and yes it really does start with a semi-colon ";"


;WITH myCTE
AS (SELECT NUMBER
	FROM 
	(
	SELECT top 1000000 ROW_NUMBER() OVER(ORDER BY t1.NUMBER) AS NUMBER
		FROM master..spt_values t1 
		CROSS JOIN master..spt_values t2
	) temp 
	WHERE NOT EXISTS (SELECT NULL FROM '+@dBRef+'.dbo.ItemSpec m WHERE m.ItemSpecID = NUMBER)
	AND NOT EXISTS (SELECT NULL FROM '+@dBRef+'.dbo.maxsequence m1 WHERE tbname = 'ItemSpec' AND NUMBER BETWEEN maxreserved - 100 AND maxreserved + 100)
)
, myCTE2 AS 
	(SELECT NUMBER AS ItemSpecID_NEW, ROW_NUMBER() OVER (ORDER BY NUMBER) AS RowNum
		FROM myCTE
	)
UPDATE TEMP_ItemSpec_'+@siteExt+'
              SET ItemSpecID_NEW = B.ItemSpecID_NEW
             FROM TEMP_ItemSpec_'+@siteExt+' A
            INNER
             JOIN myCTE2 B ON A.ROWNUM=B.ROWNUM


The above example is for the ItemSpec table. You will notice references to variables (@dBRef & @siteExt), it won't work like this if you try to build the query with replacement variables (warning <grin>). You would need to imbed it in a string and then execute the string; something to be aware of in your coding. Here is an example of using exec on an imbedded SQL string:


SET @SQL = 'IF OBJECT_ID(''dbo.TEMP_ItemSpec_'+@siteExt+''') IS NOT NULL drop table TEMP_ItemSpec_'+@siteExt+''
EXEC(@SQL)


Of course someone is going to say exec() is depreciated (don't think it is and since we're not writing code for an app, but for direct operations on the database we don't have to worry about SQL injection.) For the diehards out there here is another example doing the "new" way:


-- Are we using MaxValue?
SET @SQL = N'select @uMaxValue = 1 from '+@dBRef+'.dbo.maxsequence where tbname = ''ItemSpec'' and maxvalue is not null'
IF @debug = 1 print @SQL
EXEC sp_executesql @SQL, N'@uMaxValue integer output', @uMaxValue output
 
-- Get current value
IF @uMaxValue = 0 SET @SQL = N'select @maxSeqID = max(abs(maxreserved)) from '+@dBRef+'.dbo.maxsequence where tbname = ''ItemSpec'''
IF @uMaxValue = 1 SET @SQL = N'select @maxSeqID = max(abs(maxvalue)) from '+@dBRef+'.dbo.maxsequence where tbname = ''ItemSpec'''
IF @debug = 1 print @SQL
EXEC sp_executesql @SQL, N'@maxSeqID integer output', @maxSeqID output


These examples are great as it was the only way I could get variables passed into the string and then back out from the execution of the SQL query itself. In the example "Are we using MaxValue" we got a 1 if MaxValue was being used instead of MaxReserved (MaxReserved had "max'd out" <grin>). We then used that value in an "IF" statement to setup our select string for the second part (the N' syntax is used when reference nvarchar or variable length fields (see below) in SQL Expressions.) We queried for either the MaxReserved or MaxValue depending on the output of the previous query set and returned that value as @maxSeqID for further testing and then if necessary the update of the appropriate value.
Finally - after crunching the numbers, performing whatever updates to the temporary tables we need to do to make the incoming data match the destination system configuration we copy the new data into the target Maximo system, note that the ITEMSPECid_new field is copied into the ITEMSPECid field in the target.


SET @SQL = 'INSERT INTO '+@dBRef+'.dbo.ItemSpec
      (itemnum,assetattrid,rotating,classstructureid
		 , alllocspecusevalue,displaysequence,numvalue,measureunitid
		 , alnvalue,changedate,changeby,orgid,allasspecusevalue
		 , itemsetid,mandatory,ITEMSPECid)
		SELECT  itemnum,assetattrid,rotating,classstructureid
		 , alllocspecusevalue,displaysequence,numvalue,measureunitid
		 , alnvalue,changedate,changeby,orgid,allasspecusevalue
		 , itemsetid,mandatory,ITEMSPECid_new
  FROM TEMP_ItemSpec_'+@siteExt
IF @debug = 1 print @SQL
EXEC(@SQL)


Performance Information[edit]

Will try to get an example of the looping code that we were using previously up here as well. There is NOTHING wrong with the code or how it was implemented, it just wasn't "performance centric". Whenever you try to loop through thousands of records it will take time. It's a "bad" way of thinking about large changes to a database. Get out of "row" mode and start thinking about in terms of "table" updates. Great perhaps for a smaller subset of records or for selecting from a smaller table and then going out and performing actions based on the selection, even perhaps a "quick and dirty" fix.
That said; Loading MatUseTrans with about 178,000 records took 2:57 - 2 hours and 57 min with the "looping by record" code. Loading the same table using the CTE method described above took under 5 seconds (yes - five (5) seconds ... about 4:378 (4 seconds, 378 milliseconds)) on the first run...) Grouping the loads, so far the longest time taken has been for PoStatus - PoHist - PoLineHist - PoCostHist - PrStatus - PrHist - PrLineHist - PrCostHist all in one SQL script ... that run time was just slightly more than ... yes, ... 5 min 10 sec...

"Total processing time for the PoStatus - PoHist - PoLineHist - PoCostHist - PrStatus - PrHist - PrLineHist - PrCostHist Build Script: 00:05:10:817 hh:mm:ss:mmm"


Good Information Can be found at these outstanding locations[edit]


--Will (talk) 21:38, 21 February 2013 (CST)