123,123,123

Python知識分享網(wǎng) - 專業(yè)的Python學(xué)習(xí)網(wǎng)站 學(xué)Python，上Python222

Swin3D：一個用于3D室內(nèi)場景理解的預(yù)先訓(xùn)練的Transformer主干 PDF 下載

匿名網(wǎng)友發(fā)布于：2025-05-31 10:56:26

(侵權(quán)舉報(bào))

(假如點(diǎn)擊沒反應(yīng)，多刷新兩次就OK！)

Swin3D：一個用于3D室內(nèi)場景理解的預(yù)先訓(xùn)練的Transformer主干 PDF 下載圖1

資料內(nèi)容：

. Introduction

Pretrained backbones with fine-tuning have been widely

applied to various 2D vision and NLP tasks [13, 2, 10, 3],

where a backbone network pretrained on a large dataset is

concatenated with task-specific back-end and then fine-tuned

for different downstream tasks. This approach demonstrates

Interns at Microsoft Research Asia. †Contact person.

its superior performance and great advantages in reducing

the workload of network design and training, as well as the

amount of labeled data required for different vision tasks.

In the work, we present a pretrained 3D backbone, named

SWIN3D, for 3D indoor scene understanding tasks. Our

method represents the 3D point cloud of an input 3D scene as

sparse voxels in 3D space and adapts the Swin Transformer

[30] designed for regular 2D images to unorganized 3D

points as the 3D backbone. We analyze the key issues that

prevent the na¨?ve 3D extension of Swin Transformer from

exploring large models and achieving high performance,

i.e., the high memory complexity, the ignorance of signal

irregularity. Based on our analysis, we develop a novel

3D self-attention operator to compute the self-attentions of

sparse voxels within each local window, which reduces the

memory cost of self-attention from quadratic to linear with

respect to the number of sparse voxels within a window and

computes efficiently; enhances self-attention via capturing

various signal irregularities by our generalized contextual

relative positional embedding [48, 26].

The novel design of our SWIN3D backbone enables us to

scale up the backbone model and the amount of data used

for pretraining. To this end, we pretrained a large SWIN3D

model with 60M parameters via a 3D semantic segmenta

tion task over a synthetic 3D indoor scene dataset [60] that

includes 21K rooms and is about ten times larger than the

ScanNet dataset. After pretraining, we cascade the pretrained

SWIN3D backbone with task-specific back-end decoders

and fine-tune the models for various downstream 3D indoor

scene understanding tasks.

熱門標(biāo)簽推薦

小鋒老師，前世界500強(qiáng)央企軟件工程師，12年Java+Pyton老司機(jī)，技術(shù)專家，高級講師，每天堅(jiān)持鍛煉身體，堅(jiān)持早睡早起，崇尚自由，平時喜歡帶帶Java學(xué)員 (已經(jīng)成功指導(dǎo)2000+學(xué)員高薪就業(yè))，喜歡搞搞產(chǎn)品，附帶搞搞技術(shù)自媒體，喜歡研究主流技術(shù)，熱愛技術(shù)和教育。小鋒網(wǎng)絡(luò)科技光杠司令員。

友情鏈接： Java知識分享網(wǎng)| Java1234課堂

免責(zé)聲明：本站是非盈利教學(xué)演示站點(diǎn)，網(wǎng)站所有資源均轉(zhuǎn)載自第三方站點(diǎn)或者是網(wǎng)友提供，僅供讀者預(yù)覽及學(xué)習(xí)交流使用，下載后請24小時內(nèi)刪除，如果喜歡請購買正版資源!原作者如果認(rèn)為本站侵犯了您的版權(quán),請發(fā)送郵件到 caofeng2012@126.com 告知管理員,我們24小時內(nèi)會處理!

python222官方公眾號

小鋒老師企業(yè)微信

感谢您访问我们的网站，您可能还对以下资源感兴趣：

精品久久久久久久

99在线免费在线观看 91夜色精品偷窥熟女精品网站色蜜桃久久夜色精品国产九九视频在线观看6

熱門帖子推薦

相關(guān)帖子推薦

熱門標(biāo)簽推薦