Wednesday, December 9, 2009

backup exec 12.5 stall - e0008821

So every once in a while one of my policy based disk to disk to tape backup jobs will just hang or stall forever. It's really nifty since it will go well beyond the auto-cancellation period - you know the ' stop job if it takes longer than x hours' checkbox. The first few times it happened I wound up having to restart the whole backup server to get the job to cancel since no amount of effort on the front end GUI would fix it. After reboot it would show the failed job with a generic error code of e0008821. I even tried restarting the server that the job status claimed it was working on at the time of the hang. After a few more tries it occurred to me that the job status was lying to me and that it may actually be trying to communicate with the next server on the backup list. So to the command prompt I went. The results of netstat -aon | more

show that the server is currently communicating with port 10000 on another server. The catch is that while the job status said it was working on server A, the netstat showed it was only talking to server B at the time. So I went to server B and restarted the backup Agent and my stalled job suddenly started proceeding again. Of course, this isn't a great long term fix but it's not unusual for backup agents to need a good kick start from time to time. It would be nice if it was better at auto-recovery...

5 comments:

Sean said...

Thankyou! We have had EXACTLY that same issue with our BE process. Investigating lead us to find the next server in line (exchange) had refused to talk. We restarted the BE service, and presto!
You have indeed saved us countless hours of hairloss.
Kudos to you and your blog :)

James McKey said...

I'd like to get some debugs on this (via SGmon) from both sides (media server & remote) to have someone on our Backup Exec Dev engine team take at look at. Can you @ message me at our twitter account? (@backupexec)

Unknown said...

I have had this exact issue for about 3 weeks now and am about to pull my hair out! I have just tried your solution to see if this will work, I will report back tomorrow.

Teche said...

While Exchange full backup was running exec server was restarted unexpectedly. After that any job is running -even inventory of eject- it is queuing and if you tried to cancel it but cancel is pending.
netstat shows :TCP 0.0.0.0:10000 0.0.0.0:0

I am afraid to restart Exec agent in exchange server because it we cant afford any downtime in case of any issues.

Exchange VSS also needs a restart.
We have a scheduled downtime tonight and we will restart exchange server that time.

Teche said...

Bellow worked for me:

Hello,

This happens because the BE software is not able to communicate with the TAPE Drive.

Resolution:
1 : Open Backup Exec Console and go to the Media TAB...
2 : Right Click on the SERVER NAME and Click on "PAUSED"
3 : Right Click and click on "UNPAUSED"...

TRY ANY UTILITY JOB LIKE INVENTORY OR ERASE.