Friday 8 May 2015

AIF process stuck / never ending (due to AIFResponse deletion)

Scenario of the issue
  • Created a document service (Eg. TableA) and Inbound port to test it
  • While running the test, the receive (AifGateWayReceiveService) is OK, but the processing (AifInboundProcessingService) seems to stuck there / never ending

In details
  • Start up 2 AX client, Eg ClientA and ClientB.
  • ClientA: The testing starts with getting the Inbound port (file adaptor) and XML file ready
  • Dump the XML into the inbound folder
  • ClientA: Run a simple job which contain this line new AifGateWayReceiveService().run();
  • ClientA: Check "Queue manager" (Path: System administration > Periodic > Service and AIF) and confirmed the message has gone in
  • ClientB: Run another simple job which contain this line new AifInboundProcessingService().run();
  • ClientA: Check the "Queue manager" and confirmed the line is gone and a new record has been inserted into the table used by the document service (Eg. TableA)
  • ClientB: The job seems not finishing, AX is frozen and not responding

Below is a sample of the simple job used in above steps (just toggle around the options).

boolean runReceive = true,
        runProcess = false;
    
if(runReceive)
{
    new AifGateWayReceiveService().run();
}
    
if(runProcess)
{
    new AifInboundProcessingService().run();
}


Investigation result
The frozen isn't actually hanging, instead, it is running some record deletion, which took AGES, hence, not responding.

Below is the Trace Parser screenshot showing the trace file captured for around 30 seconds.

Within this short period, there're thousands of calls to database server for a DELETE statement.

This DELETE statement came from the AIF which tries to delete the expired AIFResponse records.



This DELETE statement supposed to delete any records that's older than the expired datetime (used today's datetime minus the AIFGlobalSettings.ResponseCacheLifetime value). Typically a 24 hours timeframe is a common value. But this table contains millions of records (up to 2 years old of data is in this table).

The solution is to get rid to this data from SQL. Since this is development environment, truncate table would do. But if this production environment, it would need more care on how the data is deleted (performance, time of performing the deletion, recover mode & backup chain, etc).