
The case of the exception that a catch (...) didn't catch - The Old New Thing (microsoft.com)
https://devblogs.microsoft.com/oldnewthing/20240405-00/?p=109621
Raymond Chen 2024年04月05日
一位客户认为他们修复了一个bug,但他们仍然因为这个bug而崩溃。
根据!analyze的输出,问题来自于这个堆栈:
contoso!winrt::hresult_error::hresult_error+0x143
contoso!winrt::throw_hresult+0x132
contoso!winrt::impl::consume_LitWare_IIconProvider
<winrt::LitWare::IIconProvider>::LoadIcon+0x3b
contoso!winrt::Contoso::implementation::IconDataModel::
ReloadIcon$_ResumeCoro$1+0x214
contoso!winrt::impl::resume_background_callback+0x10
ntdll!TppSimplepExecuteCallback+0xa3
ntdll!TppWorkerThread+0x8f6
kernel32!BaseThreadInitThunk+0x1d
ntdll!RtlUserThreadStart+0x28
这令人困惑,因为“我们已经修复了那个bug!”文件版本号和时间戳确认ReloadIcon的代码确实捕获了异常:
try
{
icon = m_provider.LoadIcon(); // ⇐ blamed frame
}
catch(...)
{
// There was a problem getting the new icon.
// Just stick with the old one.
LOG_CAUGHT_EXCEPTION();
co_return;
}
让我们看看崩溃时的堆栈:
KERNELBASE!RaiseFailFastException+0x152
combase!RoFailFastWithErrorContextInternal2+0x4d9
contoso!wil::details::FailfastWithContextCallback+0xc1
contoso!wil::details::WilFailFast+0x47
contoso!wil::details::ReportFailure_NoReturn<3>+0x2df
contoso!wil::details::ReportFailure_Base<3,0>+0x30
contoso!wil::details::ReportFailure_CaughtExceptionCommonNoReturnBase<3>+0xa7
contoso!wil::details::ReportFailure_CaughtExceptionCommon+0x22
contoso!wil::details::ReportFailure_CaughtException<3>+0x40
contoso!wil::details::in1diag3::FailFast_CaughtException+0x13
contoso!`<lambda_f370031fe3623a0b308de0bbdeb2db76>::operator()'::`1'::catch$2+0x22
ucrtbase!_CallSettingFrame_LookupContinuationIndex+0x20
ucrtbase!__FrameHandler4::CxxCallCatchBlock+0x115
ntdll!RcFrameConsolidation+0x6
contoso!<lambda_f370031fe3623a0b308de0bbdeb2db76>::operator()+0x1a
contoso!std::invoke+0x24
contoso!std::_Invoker_ret<void,1>::_Call+0x24
contoso!std::_Func_impl_no_alloc<<lambda_f370031fe3623a0b308de0bbdeb2db76>,
void,Concurrency::task<void> >::_Do_call+0x28
contoso!std::_Func_class<void,Concurrency::task<void> >::operator()+0x31
contoso!Concurrency::details::_MakeTToUnitFunc::__l2::
<lambda_64124396551846798083ef48cd389b4a>::operator()+0x46
contoso!std::invoke+0x66
contoso!std::_Invoker_ret<unsigned char,0>::_Call+0x66
contoso!std::_Func_impl_no_alloc<<lambda_64124396551846798083ef48cd389b4a>,
unsigned char,Concurrency::task<void> >::_Do_call+0x72
contoso!std::_Func_class<unsigned char,Concurrency::task<void> >::
operator()+0x32
contoso!Concurrency::task<void>::_ContinuationTaskHandle<void,
void,std::function<void __cdecl(Concurrency::task<void>)>,
std::integral_constant<bool,1>,Concurrency::details::_TypeSelectorNoAsync>::
_LogWorkItemAndInvokeUserLambda<std::function<unsigned char __cdecl(
Concurrency::task<void>)>,Concurrency::task<void> >+0x8b
contoso!Concurrency::task<void>::_ContinuationTaskHandle<void,
void,std::function<void __cdecl(Concurrency::task<void>)>,
std::integral_constant<bool,1>,Concurrency::details::_TypeSelectorNoAsync>::
_Continue+0x8c
contoso!Concurrency::task<void>::_ContinuationTaskHandle<void,
void,std::function<void __cdecl(Concurrency::task<void>)>,
std::integral_constant<bool,1>,Concurrency::details::_TypeSelectorNoAsync>::
_Perform+0x8
contoso!Concurrency::details::_PPLTaskHandle<unsigned char,Concurrency::task<
void>::_ContinuationTaskHandle<void,void,std::function<
void __cdecl(Concurrency::task<void>)>,std::integral_constant<bool,1>,
Concurrency::details::_TypeSelectorNoAsync>,
Concurrency::details::_ContinuationTaskHandleBase>::invoke+0x37
contoso!Concurrency::details::_TaskProcHandle::_RunChoreBridge+0x25
contoso!Concurrency::details::_DefaultPPLTaskScheduler::_PPLTaskChore::
_Callback+0x26
msvcp140!Concurrency::details::`anonymous namespace'::
_Task_scheduler_callback+0x5d
ntdll!TppWorkpExecuteCallback+0x13a
ntdll!TppWorkerThread+0x8f6
kernel32!BaseThreadInitThunk+0x1d
ntdll!RtlUserThreadStart+0x28
嘿,等等,这看起来一点也不像!analyze报告的堆栈!发生了什么?
!analyze使用了第一次存储的异常堆栈。你可以使用!pde.dse命令转储所有存储的异常。
0:076> !pde.dse
Stowed Exception Array @ 0x000000002b1ef170
Stowed Exception #1 @ 0x000000001ce068e8
0x80070005 (FACILITY_WIN32 - Win32 Undecorated Error Codes):
E_ACCESSDENIED - General access denied error
Stack : 0x2b214de0
contoso!winrt::hresult_error::hresult_error+0x143
contoso!winrt::throw_hresult+0x132
contoso!winrt::impl::consume_LitWare_IIconProvider
<winrt::LitWare::IIconProvider>::LoadIcon+0x3b
contoso!winrt::Contoso::implementation::IconDataModel::
ReloadIcon$_ResumeCoro$1+0x214
contoso!winrt::impl::resume_background_callback+0x10
ntdll!TppSimplepExecuteCallback+0xa3
ntdll!TppWorkerThread+0x8f6
kernel32!BaseThreadInitThunk+0x1d
ntdll!RtlUserThreadStart+0x28
Stowed Exception #2 @ 0x000000001ce02378
0x80070005 (FACILITY_WIN32 - Win32 Undecorated Error Codes):
E_ACCESSDENIED - General access denied error
Stack : 0x12cda890
litware!winrt::hresult_error::hresult_error+0x12c
litware!winrt::throw_hresult+0x83
litware!winrt::LitWare::implementation::IconProvider::LoadIcon+0x90
litware!winrt::impl::produce<winrt::LitWare::implementation::IconProvider,
winrt::LitWare::IIconProvider>::LoadIcon+0x1b
contoso!winrt::impl::consume_LitWare_IIconProvider
<winrt::LitWare::IIconProvider>::LoadIcon+0x3b
contoso!winrt::Contoso::implementation::IconDataModel::
ReloadIcon$_ResumeCoro$1+0x214
contoso!winrt::impl::resume_background_callback+0x10
ntdll!TppSimplepExecuteCallback+0xa3
ntdll!TppWorkerThread+0x8f6
kernel32!BaseThreadInitThunk+0x1d
ntdll!RtlUserThreadStart+0x28
Stowed Exception #3 @ 0x000000001ce04fa8
0x80070005 (FACILITY_WIN32 - Win32 Undecorated Error Codes):
E_ACCESSDENIED - General access denied error
Stack : 0x1d94b410
combase!RoOriginateError+0x51
contoso!wil::details::RaiseRoOriginateOnWilExceptions+0x137
contoso!wil::details::ReportFailure_Return<1>+0x1b8
contoso!wil::details::ReportFailure_Win32<1>+0x70
contoso!wil::details::in1diag3::Return_Win32+0x18
contoso!Internal::ContosoSettingsStorage::Save+0xdc729
contoso!Internal::ContosoSettings::SaveToDefaultLocalStorage+0xf1
contoso!Internal::ContosoSettings::Save+0x4ef
contoso!Contoso::AppSettings::save+0x4ef
contoso!std::_Func_impl_no_alloc<<lambda_f4300885c0b58e31cf789c4999ed9d7a>,
void>::_Do_call+0x2b
contoso!std::_Func_impl_no_alloc<<lambda_052e919cc0e5399df76dff3972c0cac1>,
unsigned char>::_Do_call+0x28
contoso!Concurrency::task<unsigned char>::_InitialTaskHandle<void,
<lambda_f4300885c0b58e31cf789c4999ed9d7a>,
Concurrency::details::_TypeSelectorNoAsync>::_Init+0xc3
contoso!Concurrency::details::_PPLTaskHandle<unsigned char,
Concurrency::task<unsigned char>::_InitialTaskHandle<void,
<lambda_f4300885c0b58e31cf789c4999ed9d7a>,
Concurrency::details::_TypeSelectorNoAsync>,
Concurrency::details::_TaskProcHandle>::invoke+0x55
contoso!Concurrency::details::_TaskProcHandle::_RunChoreBridge+0x25
contoso!Concurrency::details::_DefaultPPLTaskScheduler::_PPLTaskChore::
_Callback+0x26
msvcp140!Concurrency::details::`anonymous namespace'::
_Task_scheduler_callback+0x5d
ntdll!TppWorkpExecuteCallback+0x13a
ntdll!TppWorkerThread+0x686
kernel32!BaseThreadInitThunk+0x10
ntdll!RtlUserThreadStart+0x2b
现在事情开始清晰起来了。
抛出 Windows 运行时异常的经验法则是,在抛出异常或返回失败的HRESULT之前,你调用RoOriginateError来捕获堆栈和其他上下文。在处理 Windows 运行时常常用到的是异常被捕获并保存(“存储”),通常在IAsyncAction或类似的接口中,然后稍后,当调用者执行co_await或类似的操作时,异常被重新抛出。
当异常被重新抛出时,原始堆栈已经被展开,所以堆栈上没有东西可以追踪。调用RoOriginateError在为时已晚之前捕获失败点的堆栈。然后这些信息可以用来“拼接”异常的生命周期,从抛出异常的代码开始,到尝试(并失败)捕获它的代码结束。
系统通过在每个线程的数据中存储错误历史来完成这种堆栈拼接,允许组件捕获该历史并将其传输到另一个线程,当任务的错误状态在线程之间移动时,并寻找具有相同HRESULT的错误。如果有一个最近的捕获堆栈与未处理的异常的HRESULT匹配,那么系统会说,“我敢打赌这两个属于一起。”
通常,所有这些堆栈拼接工作都很好,因为我们的 API 设计原则说,不应该为可恢复的错误抛出异常。这意味着通常没有大量的异常流量,所以误报的比率很低。
但在这个案例中,我们有一个误报:IconDataModel调用了IconProvider::LoadIcon(),它以E_ACCESSDENIED失败。这个异常随后被捕获并处理。我们从前两个存储的异常中看到了这一点,使用我们刚才学到的关于拼接多个错误堆栈来获得导致失败的更完整的画面。
在这种情况下,IconProvider::LoadIcon()明确地使用throw_hresult抛出了一个异常(存储的异常 #2),然后在 ABI 边界处将异常从 C++ 异常转换为HRESULT,然后在另一边,C++/WinRT 将HRESULT重新转换为异常并重新抛出(存储的异常 #1)。这个重新抛出的异常随后被catch(...)捕获,这就是那个异常的结束。
但这并不是导致我们崩溃的原因。
当前活动的堆栈显示我们从 lambda 表达式中引发了一个快速失败异常。调试器告诉我们是这个 lambda:
void ViewPreferences::SaveChanges()
{
m_settings.save_async()
.then([](concurrency::task<void> precedingTask) {
try
{
precedingTask.get();
}
CATCH_FAIL_FAST();
});
}
代码保存设置,并在操作失败时立即失败。
我们在第三个堆栈中看到了这个失败,也就是ContosoSettingsStorage::Save那个。那个Save操作以E_ACCESSDENIED失败,并记录在了失败历史中。
发生的事情是,大约在同一时间发生了两个E_ACCESSDENIED错误,!analyze试图弄清楚哪个堆栈属于哪个序列,并没有完全成功,它认为当前的失败与m_provider.LoadIcon()失败相匹配。但我们使用我们的人类大脑,看到m_provider.LoadIcon()异常被处理了,真正的罪魁祸首是存储的异常 #3。
你可以调用函数RoTransformError如果你的代码接收一个错误代码并返回一个不同的错误代码。这告诉 COM 错误跟踪这两个错误序列应该拼接在一起形成一个大的错误序列。






![基于LangChain-Chatchat实现的RAG-本地知识库的问答应用[1]-最新版快速实践并部署(检索增强生成RAG大模型)](https://img-blog.csdnimg.cn/img_convert/6a7ae6d4174fcd3d55fee11dc029437c.png)












