A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Ye Tian; Jingqiang Zhu; Lei Zhang; Lichao Mou; Xiaoxiang Zhu; Yilei Shi; Buyun Ma; Wanjun Zhao

doi:10.3791/64480

JoVE Journal > Medicine

Please note that all translations are automatically generated. Click here for the English version.

Médecine

超音波画像における甲状腺結節検出のためのSwinトランスベースのモデル

Published: April 21, 2023

doi:

10.3791/64480

Ye Tian, Jingqiang Zhu, Lei Zhang, Lichao Mou, Xiaoxiang Zhu, Yilei Shi, Buyun Ma, Wanjun Zhao

¹Department of Ultrasonography,West China Hospital of Sichuan University, ²Department of Thyroid Surgery,West China Hospital of Sichuan University, ³MedAI Technology (Wuxi) Co. Ltd.

Summary

ここでは、超音波画像における甲状腺結節検出のための新しいモデルが提案されており、Swin Transformerをバックボーンとして使用して長距離コンテキストモデリングを実行します。実験は、それが感度と精度の点でうまく機能することを証明しています。

Abstract

近年、甲状腺がんの発生率は増加しています。甲状腺結節の検出は、甲状腺がんの検出と治療の両方にとって重要です。畳み込みニューラルネットワーク(CNN)は、甲状腺超音波画像分析タスクで良好な結果を達成しています。しかし、畳み込み層の有効な受容野が限られているため、CNNは、超音波画像で甲状腺結節を特定するために重要な長距離の文脈依存性を捉えることができません。トランスフォーマーネットワークは、長距離コンテキスト情報をキャプチャするのに効果的です。これに触発されて、Swin TransformerバックボーンとFaster R-CNNを組み合わせた新しい甲状腺結節検出法を提案します。具体的には、超音波画像は最初に埋め込みの1Dシーケンスに投影され、次に階層的なSwinトランスフォーマーに供給されます。

Swin Transformerバックボーンは、自己注意の計算にシフトされたウィンドウを利用することにより、5つの異なるスケールで特徴を抽出します。その後、フィーチャピラミッドネットワーク (FPN) を使用して、異なる縮尺のフィーチャを融合します。最後に、検出ヘッドを使用して、境界ボックスと対応する信頼度スコアを予測します。2,680人の患者から収集されたデータを使用して実験を実施し、結果は、この方法が44.8%の最高のmAPスコアを達成し、CNNベースのベースラインを上回ったことを示しました。また、競合他社よりも感度が高く(90.5%)向上しました。これは、このモデルのコンテキストモデリングが甲状腺結節の検出に有効であることを示しています。

Introduction

甲状腺がんの発生率は、特に中年女性の間で、1970年以降急速に増加しています¹。甲状腺結節は甲状腺がんの出現を予測する可能性があり、ほとんどの甲状腺結節は無症候性です²。甲状腺結節の早期発見は、甲状腺がんの治療に非常に役立ちます。したがって、現在の診療ガイドラインによれば、身体検査で結節性甲状腺腫が疑われる、または異常な画像所見のあるすべての患者は、さらなる検査を受ける必要があります^3,4。

甲状腺超音波(US)は、甲状腺病変を検出および特徴付けるために使用される一般的な方法です^5,6。米国は便利で、安価で、放射線のない技術です。しかしながら、USのアプリケーションは、オペレータ^7,8によって容易に影響を受ける。甲状腺結節の形状、サイズ、エコー源性、質感などの特徴は、米国の画像で簡単に区別できます。石灰化、エコー源性、不規則な境界などの特定の米国の特徴は、甲状腺結節を特定するための基準と見なされることがよくありますが、観察者間の変動性の存在は避けられません^8,9。異なるレベルの経験を持つ放射線科医の診断結果は異なります。経験の浅い放射線科医は、経験豊富な放射線科医よりも誤診する可能性が高くなります。反射、影、エコーなどの米国の一部の特性は、画質を低下させる可能性があります。米国の画像の性質によって引き起こされるこの画質の低下は、経験豊富な医師でさえ結節を正確に見つけることを困難にします。

甲状腺結節のコンピューター支援診断(CAD)は近年急速に発展しており、さまざまな医師によって引き起こされるエラーを効果的に減らし、放射線科医が結節を迅速かつ正確に診断するのに役立ちます^10,11。米国甲状腺結節分析のために、セグメンテーション12,13、検出^14,15、分類^16,17など、さまざまなCNNベースのCADシステムが提案されています。CNNは多層の教師あり学習モデル¹⁸であり、CNNのコアモジュールは畳み込み層とプーリング層です。畳み込み層は特徴抽出に使用され、プーリング層はダウンサンプリングに使用されます。シャドウ畳み込みレイヤーはテクスチャ、エッジ、輪郭などの主要な特徴を抽出でき、深い畳み込みレイヤーは高レベルのセマンティック特徴を学習します。

CNNは、コンピュータービジョン¹⁹^、²⁰^、²¹で大きな成功を収めています。ただし、CNNは、畳み込み層の有効な受容フィールドが限られているため、長距離のコンテキスト依存性をキャプチャできません。これまで、画像分類のバックボーンアーキテクチャでは、主に CNN が使用されていました。ビジョントランスフォーマー(ViT)^22,23の出現により、この傾向は変化し、現在では多くの最先端モデルがトランスフォーマーをバックボーンとして使用しています。重なり合わない画像パッチに基づいて、ViTは、標準のトランスエンコーダ²⁵を使用して、空間的関係をグローバルにモデル化する。Swin Transformer²⁴は、機能を学習するためのシフトウィンドウをさらに導入しています。シフトウィンドウは、効率を高めるだけでなく、セルフアテンションがウィンドウで計算されるため、シーケンスの長さを大幅に短縮します。同時に、2つの隣接するウィンドウ間の相互作用は、シフト(移動)の操作によって行うことができます。コンピュータビジョンにおけるSwin Transformerの応用の成功は、超音波画像解析のためのトランスベースのアーキテクチャの調査につながった²⁶。

最近、Liらは、Faster R-CNN²⁷ に触発された甲状腺乳頭がん検出のための深層学習アプローチ²⁸を提案した。より高速なR-CNNは、古典的なCNNベースのオブジェクト検出アーキテクチャです。オリジナルのFaster R-CNNには、CNNバックボーン、地域提案ネットワーク(RPN)、ROIプーリングレイヤー、検出ヘッドの4つのモジュールがあります。CNN バックボーンは、一連の基本的な conv+bn+relu+プーリングレイヤーを使用して、入力画像から特徴マップを抽出します。次に、機能マップがRPNとROIプーリングレイヤーに供給されます。RPNネットワークの役割は、地域提案を生成することです。このモジュールでは、softmax を使用してアンカーが正であるかどうかを判断し、バウンディングボックス回帰によって正確なアンカーを生成します。ROIプーリングレイヤーは、入力特徴マップと提案を収集して提案特徴マップを抽出し、提案特徴マップを後続の検出ヘッドにフィードします。検出ヘッドは、提案機能マップを使用してオブジェクトを分類し、バウンディングボックス回帰によって検出ボックスの正確な位置を取得します。

本稿では、Faster R-CNNのCNNバックボーンをSwin Transformerに置き換えることで形成された新しい甲状腺結節検出ネットワークであるSwin Faster R-CNNを紹介し、超音波画像から結節検出のための特徴をより適切に抽出します。さらに、特徴ピラミッドネットワーク(FPN)²⁹ は、異なるスケールの特徴を集約することにより、異なるサイズの結節に対するモデルの検出性能を向上させるために使用される。

Protocol

この後ろ向き研究は、中国四川省四川大学西中国病院の施設内審査委員会によって承認され、インフォームドコンセントを取得する要件は免除されました。 1. 環境設定グラフィックプロセッシングユニット(GPU)ソフトウェアディープラーニングアプリケーションを実装するには、まずGPU関連の環境を構成します。GPU の Web サイトから GPU に適し?…

Representative Results

甲状腺の米国画像は、2008年9月から2018年2月にかけて中国の2つの病院から収集されました。この研究に米国の画像を含めるための適格基準は、生検および外科的治療前の従来の米国の検査、生検または術後の病理を伴う診断、および18歳≥でした。除外基準は甲状腺組織を含まない画像とした。 3,000の超音波画像には、1,384の悪性結節と1,616の良性結節が含まれていました?…

Discussion

このホワイトペーパーでは、環境のセットアップ、データ準備、モデル構成、およびネットワークトレーニングの実行方法について詳しく説明します。環境のセットアップ段階では、依存ライブラリに互換性があり、一致していることを確認するように注意する必要があります。データ処理は非常に重要なステップです。注釈の正確性を確保するために時間と労力を費やす必要があります。?…

Divulgations

The authors have nothing to disclose.

Acknowledgements

この研究は、中国国家自然科学基金会(助成金番号32101188)および中国の四川省科学技術局一般プロジェクト(助成金番号2021YFS0102)の支援を受けました。

Materials

GPU RTX3090	Nvidia	1	24G GPU
mmdetection2.11.0	SenseTime	4	https://github.com/open-mmlab/mmdetection.git
python3.8	—	2	https://www.python.org
pytorch1.7.1	Facebook	3	https://pytorch.org

References

Grant, E. G., et al. Thyroid ultrasound reporting lexicon: White paper of the ACR Thyroid Imaging, Reporting and Data System (TIRADS) committee. Journal of the American College of Radiology. 12 (12 Pt A), 1272-1279 (2015).
Zhao, J., Zheng, W., Zhang, L., Tian, H. Segmentation of ultrasound images of thyroid nodule for assisting fine needle aspiration cytology. Health Information Science and Systems. 1, 5 (2013).
Haugen, B. R. American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: What is new and what has changed. Cancer. 123 (3), 372-381 (2017).
Shin, J. H., et al. Ultrasonography diagnosis and imaging-based management of thyroid nodules: Revised Korean Society of Thyroid Radiology consensus statement and recommendations. Korean Journal of Radiology. 17 (3), 370-395 (2016).
Horvath, E., et al. An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. The Journal of Clinical Endocrinology & Metabolism. 94 (5), 1748-1751 (2009).
Park, J. -. Y., et al. A proposal for a thyroid imaging reporting and data system for ultrasound features of thyroid carcinoma. Thyroid. 19 (11), 1257-1264 (2009).
Moon, W. -. J., et al. Benign and malignant thyroid nodules: US differentiation-Multicenter retrospective study. Radiology. 247 (3), 762-770 (2008).
Park, C. S., et al. Observer variability in the sonographic evaluation of thyroid nodules. Journal of Clinical Ultrasound. 38 (6), 287-293 (2010).
Kim, S. H., et al. Observer variability and the performance between faculties and residents: US criteria for benign and malignant thyroid nodules. Korean Journal of Radiology. 11 (2), 149-155 (2010).
Choi, Y. J., et al. A computer-aided diagnosis system using artificial intelligence for the diagnosis and characterization of thyroid nodules on ultrasound: initial clinical assessment. Thyroid. 27 (4), 546-552 (2017).
Chang, T. -. C. The role of computer-aided detection and diagnosis system in the differential diagnosis of thyroid lesions in ultrasonography. Journal of Medical Ultrasound. 23 (4), 177-184 (2015).
Li, X. Fully convolutional networks for ultrasound image segmentation of thyroid nodules. , 886-890 (2018).
Nguyen, D. T., Choi, J., Park, K. R. Thyroid nodule segmentation in ultrasound image based on information fusion of suggestion and enhancement networks. Mathematics. 10 (19), 3484 (2022).
Ma, J., Wu, F., Jiang, T. A., Zhu, J., Kong, D. Cascade convolutional neural networks for automatic detection of thyroid nodules in ultrasound images. Medical Physics. 44 (5), 1678-1691 (2017).
Song, W., et al. Multitask cascade convolution neural networks for automatic thyroid nodule detection and recognition. IEEE Journal of Biomedical and Health Informatics. 23 (3), 1215-1224 (2018).
Wang, J., et al. Learning from weakly-labeled clinical data for automatic thyroid nodule classification in ultrasound images. , 3114-3118 (2018).
Wang, L., et al. A multi-scale densely connected convolutional neural network for automated thyroid nodule classification. Frontiers in Neuroscience. 16, 878718 (2022).
Krizhevsky, A., Sutskever, I., Hinton, G. E. Imagenet classification with deep convolutional neural networks. Communications of the ACM. 60 (6), 84-90 (2017).
He, K., Zhang, X., Ren, S., Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. , 770-778 (2016).
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y. Relation networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. , 3588-3597 (2018).
Szegedy, C., et al. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. , 1-9 (2015).
Dosovitskiy, A., et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. , (2020).
Touvron, H., et al. Training data-efficient image transformers & distillation through attention. arXiv:2012.12877. , (2021).
Liu, Z., et al. Swin Transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). , 9992-10002 (2021).
Vaswani, A., et al. Attention is all you need. Advances in Neural Information Processing Systems. 30, (2017).
Chen, J., et al. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv. arXiv:2102.04306. , (2021).
Ren, S., He, K., Girshick, R., Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 28, 91-99 (2015).
Li, H., et al. An improved deep learning approach for detection of thyroid papillary cancer in ultrasound images. Scientific Reports. 8, 6600 (2018).
Lin, T. -. Y., et al. Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. , 2117-2125 (2017).
Ouahabi, A. A review of wavelet denoising in medical imaging. 2013 8th International Workshop on Systems, Signal Processing and their Applications. , 19-26 (2013).
Mahdaoui, A. E., Ouahabi, A., Moulay, M. S. Image denoising using a compressive sensing approach based on regularization constraints. Sensors. 22 (6), 2199 (2022).
Castleman, K. R. . Digital Image Processing. , (1996).
Liu, W., et al. Ssd: Single shot multibox detector. European Conference on Computer Vision. , 21-37 (2016).
Redmon, J., Farhadi, A. Yolov3: An incremental improvement. arXiv. arXiv:1804.02767. , (2018).
Lin, T. -. Y., Goyal, P., Girshick, R., He, K., Dollár, P. Focalloss for dense object detection. arXiv. arXiv:1708.02002. , (2017).
Carion, N., et al. End-to-end object detection with transformers. Computer Vision-ECCV 2020: 16th European Conference. , 23-28 (2020).

Play Video

PDF

DOI

DOWNLOAD MATERIALS LIST

Citer Cet Article

Tian, Y., Zhu, J., Zhang, L., Mou, L., Zhu, X., Shi, Y., Ma, B., Zhao, W. A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images. J. Vis. Exp. (194), e64480, doi:10.3791/64480 (2023).

超音波画像における甲状腺結節検出のためのSwinトランスベースのモデル

Summary

Abstract

Introduction

Protocol

Representative Results

Discussion

Divulgations

Acknowledgements

Materials

References

Tags

Play Video

Citer Cet Article

View Video

超音波画像における甲状腺結節検出のためのSwinトランスベースのモデル

Summary

Abstract

Introduction

Protocol

Representative Results

Discussion

Divulgations

Acknowledgements

Materials

References

Tags

Play Video

Citer Cet Article

View Video

✖

To prove you're not a robot, please enter the text in the image below