Hardware Inventory Implosion After v1610 Upgrade!
Note: This post is adapted from my working notes, so I apologize for being a little all over the place. I didn’t find this issue described online, so I thought it was important to get something posted to hopefully save someone else the trouble.
Naturally, my first routine servicing upgrade caused an implosion of hardware inventory across the hierarchy. My first indication of an issue was the SMS_MP_CONTROL_MANAGER being in warning status in console for all MPs. Logs full of this:
I confirmed that virtually all clients had last submitted hardware inventory the night of the v1610 upgrade. My clients are set to inventory nightly, so something has to give.
I went to a client and initiated a full hardware inventory in Client Center. Confirmed the InventoryAgent.log indicated successfully collecting Hinv and submitting it to MP.
So clients are submitting inventory to the MP, but it’s not processing properly. At this point, So, let’s look at a Management Point.
Checking out (installpath)\SMS_CCM\Logs\MP_Hinv.log, it’s loaded up with these:
OK…. So there’s the date error. This has some discussion around the internet (thanks Google) but I don’t see anyone saying it’s forcing their hardware inventory to cease…
The “cache is still obsolete” is probably related to our issue. Unlike a lot of error messages, I can’t find anything specific online.
It says it is making a retry file with this Hinv submission. Let’s see how bad the retry files are.. Looking at (installdirectory)\inboxes\auth\dataldr.box\retry\
Not good. 5200+ files. I quickly check my other MPs and find the same.
Going back to the original error about reloading the hardware inventory class mapping table. Our Hardware Inventory is extended to include LocalGroupMembers (Sherry K.) and additionally I’ve enabled the default Bitlocker class. My impression here is that the clients are submitting these extra classes, but the site servers aren’t expecting them now.
There’s an easy way to test this… let’s take the error at face value and “trigger a HINV policy change. I hopped onto the default Client Settings policy and I disable the LocalGroupMembers class, wait a few minutes, and then re-enable it.
Giving my nearest Primary a break here – I move all of the RetryHinv files from \inboxes\auth\dataldr.box\retry to a temp folder called “old”.
New diagnostic clue: after this change, the “obsolete cache” errors stop appearing in the MP logs. Additionally: no more retry files are being generated. I take 8 RetryHinv files and paste them back into the retry directory. After about 10 minutes, all of them disappear. Dataldr.log shows this:
I check Process and they’re gone, they’ve been dealt with. Fantastic. 5,000 to go. I cut 500 of these back into the Retry directory. I suspect a number of these will be rejected because they are now too out of date. This is confirmed by some of them being moved to delta dismatch directory.
Look at that. I verified that these ran through the process folder OK. I checked the BADMIFs directories to make sure I didn’t have 500 rejected MIFs. Only a few marked as delta mismatch. I’m guessing that’s not too bad considering that these machines have been submitting and hung up completely since the 27th. I move the remaining retry files back into the retry directory….
Caught in the act- the 4,77x RetryHinv files are disappearing in front of me. Looks like they are converted from HML to .MIF and then placed back in the dataldr.box directory. This directory ballooned up and the logs are going nuts.
Processed in a couple of batches- “Finished processing 2048 MIFs SMS_INVENTORY_DATA_LOADER”
There are about 1,000 “deltamismatch” in BADMIF. This is almost certainly systems that have submitted multiple delta reports that have been caught in the RETRY queue for the past week. Not surprising.
I checked all other inboxes to verify I don’t have backlogs anywhere else.
In summary, the “Obsolete Cache” error looks to have generated a Retry for every client hardware inventory submission. There was this long loop because every inbound Hinv generated a retry and every retry failed (and generated a replacement retry). This explains behavior I saw earlier: all of the retry files were continuously having their “date modified” updated to within a few minutes of each other (and no more than about 15 minutes from current time). So, in short, the dataldr inbox was stuck in an endless loop trying to process Hinv submissions.
The issue was obviously caused during the upgrade. 50% of my clients are offline, so there’s no way that fixing the clients was the solution to get the server processing the Retrys (and inbound new submissions) without error. No, updating the Client Policy must have replaced a configuration file or setting somewhere that corrected the issue.
I can’t be more specific than that at this point, but I’ve got a grasp on the situation, it appears.
As expected, the date error is not a showstopper, just a warning. It also appears to be a common thing. I can visit it at a later time since it appears to have a simple fix. See description here: https://technet.microsoft.com/en-us/library/dn581927.aspx
36 hours later, almost all of my Active clients have Hardware Inventory Scan dates listed after the upgrade date.
Text copies of relevant messages for Google’s use:
MP needs to reload the hardware inventory class mapping table when processing Hardware Inventory.
Hinv: MP reloaded and the cache is still obsolete.