Dump分析系列一:调试开关滥用导致wdf01000.sys蓝屏

近来,蓝屏分析哥(@360蓝屏分析专家)收到不少用户反馈的关于wdf01000.sys蓝屏。 Wdf01000.sys是微软系统提供的为基于框架的驱动程序的运行库(Kernel Mode Driver Framework Runtime),是该文件本身有问题还是第三方驱动导致的呢? 我上网搜了一下,发现不少用户遇到同样的问题,但具体原因解决方法不是很清楚。于是我决定分析一下该蓝屏问题。

Windbg分析结果如下:

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M (1000007e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003. This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG. This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG. This will let us see why this breakpoint is
happening.
Arguments:
Arg1: 80000003, The exception code that was not handled
Arg2: 84289848, The address that the exception occurred at
Arg3: 8d389614, Exception Record Address
Arg4: 8d3891f0, Context Record Address
Debugging Details:
------------------
EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid
FAULTING_IP:
nt!DbgBreakPoint+0
84289848 cc int 3
EXCEPTION_RECORD: 8d389614 -- (.exr 0xffffffff8d389614)
ExceptionAddress: 84289848 (nt!DbgBreakPoint)
 ExceptionCode: 80000003 (Break instruction exception)
 ExceptionFlags: 00000000
NumberParameters: 3
 Parameter[0]: 00000000
 Parameter[1]: 86439a70
 Parameter[2]: 00000065
CONTEXT: 8d3891f0 -- (.cxr 0xffffffff8d3891f0)
eax=87541a60 ebx=00000000 ecx=00000000 edx=00000065 esi=842854bc edi=00000102
eip=84289848 esp=8d3896dc ebp=8d3896f8 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
nt!DbgBreakPoint:
84289848 cc int 3
Resetting default scope
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BUGCHECK_STR: 0x7E
PROCESS_NAME: System
CURRENT_IRQL: 0
ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached.
EXCEPTION_PARAMETER1: 00000000
EXCEPTION_PARAMETER2: 86439a70
EXCEPTION_PARAMETER3: 00000065
LAST_CONTROL_TRANSFER: from 84e56c57 to 84289848
STACK_TEXT:
8d3896d8 84e56c57 b25fc8d8 b25fc720 00000000 nt!DbgBreakPoint
8d3896f8 84e464af b25fc8d8 84e6cd18 4da038d8 Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+0x5e
8d389718 84e482a7 00000000 84e6efc8 8777e958 Wdf01000!FxIoQueue::StopProcessingForPower+0xcf
8d389738 84e3fe61 00000001 8d389760 84e5dcdc Wdf01000!FxPkgIo::StopProcessingForPower+0xbd
8d389744 84e5dcdc 867eeb48 00000001 84e6f2f8 Wdf01000!FxDeviceToMx::FxPkgIo_StopProcessingForPower+0x16
8d389760 84e5cd81 8777e958 8777ea64 8777e958 Wdf01000!FxPkgPnp::PowerGotoD3Stopped+0x47
8d3897e8 84e5dbb2 00000314 8777ea64 8777e958 Wdf01000!FxPkgPnp::PowerEnterNewState+0x11c
8d38980c 84e5e5bb 8d389824 84e711d4 8777e958 Wdf01000!FxPkgPnp::PowerProcessEventInner+0x171
8d389830 84e63e73 00000001 8d3898c4 84e63832 Wdf01000!FxPkgPnp::PowerProcessEvent+0x15c
8d38983c 84e63832 8777e958 8777ead4 8777e958 Wdf01000!FxPkgPnp::PowerPolStopping+0x1a
8d3898c4 84e64716 0000055b 8777ead4 8777e958 Wdf01000!FxPkgPnp::PowerPolicyEnterNewState+0x11c
8d3898e8 84e65388 8d389900 00000008 8777e958 Wdf01000!FxPkgPnp::PowerPolicyProcessEventInner+0x185
8d38990c 84e61b74 00000001 00000000 84e6209c Wdf01000!FxPkgPnp::PowerPolicyProcessEvent+0x172
8d389918 84e6209c 8d389948 84e61484 8777e958 Wdf01000!FxPkgPnp::PnpPowerPolicySurpriseRemove+0xc
8d389920 84e61484 8777e958 8777ea00 8777e958 Wdf01000!FxPkgPnp::PnpEventFailedIoStarting+0xd
8d389948 84e61db2 00000127 8777ea00 8777e958 Wdf01000!FxPkgPnp::PnpEnterNewState+0x104
8d38996c 84e6247a 8d389984 87541a60 8777e958 Wdf01000!FxPkgPnp::PnpProcessEventInner+0x149
8d389990 84e5b540 00000400 00000000 8777e958 Wdf01000!FxPkgPnp::PnpProcessEvent+0x13e
8d3899a4 84e60316 8d3899d8 8d3899d0 84e5ae02 Wdf01000!FxPkgPnp::PnpSurpriseRemoval+0x29
8d3899b0 84e5ae02 8777e958 8d3899d8 8661b5c0 Wdf01000!FxPkgFdo::_PnpSurpriseRemoval+0x10
8d3899d0 84e37a3f 8661b5c0 8d3899f8 84e37c63 Wdf01000!FxPkgPnp::Dispatch+0x207
8d3899dc 84e37c63 87908c08 8661b5c0 8661b7e0 Wdf01000!FxDevice::Dispatch+0x7f
8d3899f8 84245c29 87908c08 8661b5c0 8d389a94 Wdf01000!FxDevice::DispatchWithLock+0x7b
8d389a10 843e906d bdb8a8a0 976ddad8 bdb8a8a0 nt!IofCallDriver+0x63
8d389a40 844d5cff bdb8a8a0 00000000 976ddad8 nt!IopSynchronousCall+0xc2
8d389a98 844cdb98 bdb8a8a0 00000017 976ddad8 nt!IopRemoveDevice+0xd4
8d389ac0 844cda21 97d5bc60 00000000 8d389b04 nt!PnpSurpriseRemoveLockedDeviceNode+0x101
8d389ad0 844cdce3 00000003 00000000 00000000 nt!PnpDeleteLockedDeviceNode+0x21
8d389b04 844d1315 bdb5f6a8 97d5bc60 00000003 nt!PnpDeleteLockedDeviceNodes+0x4c
8d389bc4 843c1372 8d389bf4 00000000 b0d13258 nt!PnpProcessQueryRemoveAndEject+0x586
8d389bdc 843cf472 00000000 8b9f55c8 86439a70 nt!PnpProcessTargetDeviceEvent+0x38
8d389c00 8428c1eb 8b9f55c8 00000000 86439a70 nt!PnpDeviceEventWorker+0x216
8d389c50 8441907a 00000001 93c6bbb2 00000000 nt!ExpWorkerThread+0x10d
8d389c90 842bf819 8428c0de 00000001 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
FOLLOWUP_IP:
Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e
84e56c57 8d45f4 lea eax,[ebp-0Ch]
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: Wdf01000
IMAGE_NAME: Wdf01000.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bbf28
STACK_COMMAND: .cxr 0xffffffff8d3891f0 ; kb
FAILURE_BUCKET_ID: 0x7E_Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e
BUCKET_ID: 0x7E_Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e
Followup: MachineOwner

可以看到,windbg自动分析蓝屏崩溃最后发生在wdf01000.sys驱动的Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e位置,那么是不是真的就是wdf01000.sys导致的呢,来看一下蓝屏位置84e56c57(Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e)附近的代码:

0: kd> ub 84e56c57+c l10
Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+0x3a:
84e56c33 ff15e4a0e684 call dword ptr [Wdf01000!_imp__KeGetCurrentThread (84e6a0e4)]
84e56c39 50           push eax
84e56c3a 6800e6e684   push offset Wdf01000!`string' (84e6e600)
84e56c3f e87edcfbff   call Wdf01000!DbgPrint (84e148c2)
84e56c44 8b45fc       mov eax,dword ptr [ebp-4]
84e56c47 83c410       add esp,10h
84e56c4a 389896000000 cmp byte ptr [eax+96h],bl
84e56c50 7405         je Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+0x5e (84e56c57)
84e56c52 e8a90d0100   call Wdf01000!DbgBreakPoint (84e67a00) //进入该函数蓝屏
84e56c57 8d45f4       lea eax,[ebp-0Ch]
84e56c5a 50           push eax
84e56c5b 53           push ebx
84e56c5c 53           push ebx
84e56c5d 53           push ebx
84e56c5e ff7508       push dword ptr [ebp+8]
84e56c61 ffd6         call esi

可以看到蓝屏前84e56c52 处代码是Wdf01000.sys主动调用DbgBreakPoint,正是该调用让机器蓝屏的。DbgBreakPoint是为了方便驱动开发人员调试的函数,在没有调试器的环境下,就会触发一个蓝屏。那么在这里是否出现了严重的错误导致wdf01000.sys无法执行而只有让系统蓝屏呢? 我们看一下这一段完整的反汇编代码,分析一下函数Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal的逻辑:

kd> u Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal l30
Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal:
84e56bf9 8bff             mov edi,edi
84e56bfb 55               push ebp
84e56bfc 8bec             mov ebp,esp
84e56bfe 83ec0c           sub esp,0Ch
84e56c01 834df8ff         or dword ptr [ebp-8],0FFFFFFFFh
84e56c05 53               push ebx
84e56c06 56               push esi
84e56c07 8b35b0a0e684     mov esi,dword ptr [Wdf01000!_imp__KeWaitForSingleObject (84e6a0b0)]
84e56c0d 57               push edi
84e56c0e 8d45f4           lea eax,[ebp-0Ch]
84e56c11 50               push eax
84e56c12 33db             xor ebx,ebx
84e56c14 53               push ebx
84e56c15 53               push ebx
84e56c16 53               push ebx
84e56c17 ff7508           push dword ptr [ebp+8]
84e56c1a 894dfc           mov dword ptr [ebp-4],ecx
84e56c1d c745f400ba3cdc   mov dword ptr [ebp-0Ch],0DC3CBA00h
84e56c24 ffd6             call esi
84e56c26 bf02010000       mov edi,102h
84e56c2b eb36             jmp Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+0x6a (84e56c63)
84e56c2d ff7510           push dword ptr [ebp+10h]
84e56c30 ff750c           push dword ptr [ebp+0Ch]
84e56c33 ff15e4a0e684     call dword ptr [Wdf01000!_imp__KeGetCurrentThread (84e6a0e4)]
84e56c39 50               push eax
84e56c3a 6800e6e684       push offset Wdf01000!`string' (84e6e600)
84e56c3f e87edcfbff       call Wdf01000!DbgPrint (84e148c2)
84e56c44 8b45fc           mov eax,dword ptr [ebp-4]
84e56c47 83c410           add esp,10h
84e56c4a 389896000000     cmp byte ptr [eax+96h],bl //关键点,当[eax+96h]值不为0时,导致DbgBreakPoint调用
84e56c50 7405             je Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+0x5e (84e56c57)
84e56c52 e8a90d0100       call Wdf01000!DbgBreakPoint (84e67a00)
84e56c57 8d45f4           lea eax,[ebp-0Ch]
84e56c5a 50               push eax
84e56c5b 53               push ebx
84e56c5c 53               push ebx
84e56c5d 53               push ebx
84e56c5e ff7508           push dword ptr [ebp+8]
84e56c61 ffd6             call esi
84e56c63 3bc7             cmp eax,edi
84e56c65 74c6             je Wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+0x34 (84e56c2d)
84e56c67 5f               pop edi
84e56c68 5e               pop esi
84e56c69 5b               pop ebx
84e56c6a c9               leave
84e56c6b c20c00           ret 0Ch
84e56c6e cc               int 3
84e56c6f cc               int 3

这段反汇编代码的伪代码如下:

NTSTATUS _FX_DRIVER_GLOBALS::WaitForSignal(void *this, PVOID Object, int a3, int a4)
{
     NTSTATUS hr;
     PKTHREAD thread;
     LARGE_INTEGER Timeout = -600000000; //一分钟
     hr = KeWaitForSingleObject(Object, 0, 0, 0, (PLARGE_INTEGER)&Timeout);
     while (hr == STATUS_TIMEOUT)
     {
         thread = KeGetCurrentThread();
         DbgPrint("Thread 0x%p is %s 0x%pn", thread, a3, a4);
         if (this->FxVerifierDbgBreakOnError) // cmp byte ptr [eax+96h],bl
             DbgBreakPoint();
         hr = KeWaitForSingleObject(Object, 0, 0, 0, (PLARGE_INTEGER)&Timeout);
     }
    return hr;
}

上述代码逻辑是:等待对象object 一分钟,如果超时,判断this->FxVerifierDbgBreakOnError 标志是否存在,如果存在则中断;如果不存在继续循环等待。

这是一个什么标志呢?看看_FX_DRIVER_GLOBALS 结构:

kd> dt Wdf01000!_FX_DRIVER_GLOBALS
 +0x000 Linkage : _LIST_ENTRY
 +0x008 WdfHandleMask : Uint4B
 +0x00c WdfVerifierAllocateFailCount : Int4B
 +0x010 Tag : Uint4B
 +0x014 Driver : Ptr32 FxDriver
 +0x018 DebugExtension : Ptr32 FxDriverGlobalsDebugExtension
 +0x01c LibraryGlobals : Ptr32 FxLibraryGlobalsType
 +0x020 WdfTraceDelayTime : Uint4B
 +0x024 WdfLogHeader : Ptr32 Void
 +0x028 FxPoolFrameworks : FX_POOL
 +0x07c FxPoolTrackingOn : UChar
 +0x080 ThreadTableLock : MxLock
 +0x084 ThreadTable : Ptr32 _LIST_ENTRY
 +0x088 WdfBindInfo : Ptr32 _WDF_BIND_INFO
 +0x08c ImageAddress : Ptr32 Void
 +0x090 ImageSize : Uint4B
 +0x094 FxVerifierOn : UChar
 +0x095 FxVerifyDownlevel : UChar
 +0x096 FxVerifierDbgBreakOnError : UChar
 +0x097 FxVerifierDbgBreakOnDeviceStateError : UChar
 +0x098 FxVerifierHandle : UChar
 +0x099 FxVerifierIO : UChar
 +0x09a FxVerifierLock : UChar
 +0x09b FxVerifyOn : UChar
 +0x09c FxVerboseOn : UChar
 +0x09d FxForceLogsInMiniDump : UChar
 +0x09e FxRequestParentOptimizationOn : UChar
 +0x09f FxTrackDriverForMiniDumpLog : UChar
 +0x0a0 BugCheckDriverInfoIndex : Uint4B
 +0x0a4 BugCheckCallbackRecord : _KBUGCHECK_REASON_CALLBACK_RECORD
 +0x0c0 FxEnhancedVerifierOptions : Uint4B
 +0x0c8 Public : _WDF_DRIVER_GLOBALS

根据名字再结合msdn文档 ,我们知道这应该是控制调试的一个标志,应该由注册表里DbgBreakOnError控制。

一般来说,调试状态是在程序开发的过程中设置,目的是为了更好的发现问题。 但是对于发布的程序,是不应该设置调试诊断模式的,设置这些状态会增加系统的额外开销,并且也有可能导致其他无法预料的副作用。

回到本文这个蓝屏,根据堆栈我们可以看到,系统可能是要进入休眠或者关机状态前,通知驱动停止工作,等待一分钟内没有响应,由于该驱动设置了诊断调试状态,系统于是调用DbgBreakPoint本意是希望中断进入调试器,但是现在环境是用户系统而非开发环境,是不会有调试器的,这样导致系统直接蓝屏。 在这里,一分钟没有响应有可能是驱动本身处理有问题,也有可能是系统忙碌。但是无论如何对外发布的程序设置调试状态导致额外作用是不恰当的。

基于以上分析结果,我联系了几位用户查看环境。 发现基本都是手机的驱动开了这个设置,并且设置基本相同,应该这些手机驱动 都是基于同一个模板驱动开发的。

[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservicesandroidusb]
"Type"=dword:00000001
"Start"=dword:00000003
"ErrorControl"=dword:00000001
"Tag"=dword:0000001e
"ImagePath"=hex(2):53,00,79,00,73,00,74,00,65,00,6d,00,33,00,32,00,5c,00,44,00,
 72,00,69,00,76,00,65,00,72,00,73,00,5c,00,73,00,73,00,61,00,64,00,61,00,64,
 00,62,00,2e,00,73,00,79,00,73,00,00,00
"DisplayName"="SAMSUNG Android Composite ADB Interface Driver"
"Group"="Base"
[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservicesandroidusbParameters]
"MaximumTransferSize"=dword:00001000
"DebugLevel"=dword:00000002
[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservicesandroidusbParametersWdf]
"VerboseOn"=dword:00000001
"VerifierOn"=dword:00000001            // 此处应改为0 该值也是诊断用的
"DbgBreakOnError"=dword:00000001       // 此处应改为0
"KmdfLibraryVersion"="1.5"
"WdfMajorVersion"=dword:00000001
"WdfMinorVersion"=dword:00000005
"TimeOfLastSqmLog"=hex(b):2d,75,34,49,7b,2b,ce,01
[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservicesandroidusbEnum]
"Count"=dword:00000000
"NextInstance"=dword:00000000

 

那么如何修复这个问题呢? 我们可以打开注册表标记器(regedit.exe),定位到[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservices],在下面搜索DbgBreakOnError和VerifierOn, 然后把值修改为0. 出现同类问题的朋友可以试一下这个方法。

关于 “Dump分析系列一:调试开关滥用导致wdf01000.sys蓝屏” 的 4 个意见

  1. 个人觉得不是微软的问题,不蓝屏这块逻辑无法处理。而且是第三方驱动打开的reg开关,目的就是找出自己驱动的问题,但不幸的是影响了用户。但是第三方驱动如果不开这个调试开关,系统就会陷入死循环。所以微软应该在死循环几次之后果断强制摘掉该驱动对象。

评论关闭。