Paper | arXiv | Video | Project Page
This is the repository that contains source code for the paper:
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
- We present DiffuScene, a diffusion model for diverse and realistic indoor scene synthesis.
You can create a conda environment called diffuscene
using
conda env create -f environment.yaml
conda activate diffuscene
Next compile the extension modules. You can do this via
python setup.py build_ext --inplace
pip install -e .
Install ChamferDistancePytorch
cd ChamferDistancePytorch/chamfer3D
python setup.py install
The pretrained model, results, and preprocess datasets are put into GoogleDrive
The pretrained models of DiffuScene (unconditional, text, scene_rearrangement) can be downloaded from here. The pretrained model of ShapeAutoencoder can be downloaded from here. The google drive also contains the results of unconditional generation.
To directly train and evaluate DiffuScene without tedious data preprocessing, we also provide the preprocessed 3D-Front and 3D-FUTURE
The training and evaluation are based on the 3D-FRONT and the 3D-FUTURE dataset. To download both datasets, please refer to the instructions provided in the dataset's webpage.
To accelerate the preprocessing speed, we can sepcify the PATH_TO_SCENES
environment variable for all scripts. This filepath contains the
parsed ThreedFutureDataset
after being pickled. To pickle it, you can simply run this script as follows:
python pickle_threed_future_dataset.py path_to_output_dir path_to_3d_front_dataset_dir path_to_3d_future_dataset_dir path_to_3d_future_model_info --dataset_filtering room_type
Based on the pickled ThreedFutureDataset, we also provide a script to pickle the sampled point clouds of object CAD models, which are used to shape autoencoder training and latent shape code extraction.
python pickle_threed_future_pointcloud.py path_to_output_dir path_to_3d_front_dataset_dir path_to_3d_future_dataset_dir path_to_3d_future_model_info --dataset_filtering room_type
For example,
python pickle_threed_future_dataset.py /cluster/balrog/jtang/3d_front_processed/ /cluster/balrog/jtang/3D-FRONT/ /cluster/balrog/jtang/3D-FUTURE-model /cluster/balrog/jtang/3D-FUTURE-model/model_info.json --dataset_filtering threed_front_livingroom --annotation_file ../config/livingroom_threed_front_splits.csv
PATH_TO_SCENES="/cluster/balrog/jtang/3d_front_processed/threed_front.pkl" python pickle_threed_fucture_pointcloud.py /cluster/balrog/jtang/3d_front_processed/ /cluster/balrog/jtang/3D-FRONT/ /cluster/balrog/jtang/3D-FUTURE-model /cluster/balrog/jtang/3D-FUTURE-model/model_info.json --dataset_filtering threed_front_livingroom --annotation_file ../config/livingroom_threed_front_splits.csv
Note that these two scripts should be separately executed for different room
types containing different objects. For the case of 3D-FRONT this is for the
bedrooms and the living/dining rooms, thus you have to run this script twice
with different --dataset_filtering
and --annotation_file
options. Please check the help menu for
additional details.
Then you can train the shape autoencoder using all models from bedrooms/diningrooms/livingrooms.
cd ./scripts
PATH_TO_SCENES="/cluster/balrog/jtang/3d_front_processed/threed_front.pkl" python train_objautoencoder.py ../config/obj_autoencoder/bed_living_diningrooms_lat32.yaml your_objae_output_directory --experiment_tag "bed_living_diningrooms_lat32" --with_wandb_logger
Next, you can use the pre-train checkpoint of shape autoencoder to extract latent shape codes for each room type. Take the bedrooms for example:
PATH_TO_SCENES="/cluster/balrog/jtang/3d_front_processed/threed_front.pkl" python generate_objautoencoder.py ../config/objautoencoder/bedrooms.yaml your_objae_output_directory --experiment_tag "bed_living_diningrooms_lat32"
Finally, you can run preprocessing_data.py
to read and pickle object properties (class label, location, orientation, size, and latent shape features) of each scene.
PATH_TO_SCENES="/cluster/balrog/jtang/3d_front_processed/threed_front.pkl" python preprocess_data.py /cluster/balrog/jtang/3d_front_processed/livingrooms_objfeats_32_64 /cluster/balrog/jtang/3D-FRONT/ /cluster/balrog/jtang/3D-FUTURE-model /cluster/balrog/jtang/3D-FUTURE-model/model_info.json --dataset_filtering threed_front_livingroom --annotation_file ../config/livingroom_threed_front_splits.csv --add_objfeats
To train diffuscene on 3D Front-bedrooms, you can run
./run/train.sh
./run/train_text.sh
To generate the scene of unconditional and text-conditioned scene generation with our pretraiened models, you can run
./run/generate.sh
./run/generate_text.sh
If you want to calculate evaluation metrics of bbox IoU and average number of symmetric pairs, you can add the option--compute_intersec
.
Please note that our current text-conditioned model is used to generate a full scene configuration from a text prompt of partial scene (2-3 sentences).
If you want to evaluate our method with text prompts of more sentences, you might need to re-train our method.
To evaluate FID and KID from rendered 2D images of generated and reference scenes, you can run:
python compute_fid_scores.py $ground_truth_bedrooms_top2down_render_folder $generate_bedrooms_top2down_render_folder ../config/bedroom_threed_front_splits.csv
python compute_fid_scores.py $ground_truth_diningrooms_top2down_render_folder $generate_diningrooms_top2down_render_folder ../config/diningroom_threed_front_splits.csv
To evaluate improved precision and recall, you can run:
python improved_precision_recall.py $ground_truth_bedrooms_top2down_render_folder $generate_bedrooms_top2down_render_folder ../config/bedroom_threed_front_splits.csv
python improved_precision_recall.py $ground_truth_diningrooms_top2down_render_folder $generate_diningrooms_top2down_render_folder ../config/diningroom_threed_front_splits.csv
Please also check out the following papers that explore similar ideas:
- LEGO-Net: Learning Regular Rearrangements of Objects in Rooms.[homepage]
- Learning 3D Scene Priors with 2D Supervision. [homepage]
- Sceneformer: Indoor Scene Generation with Transformers. [homepage]
- ATISS: Autoregressive Transformers for Indoor Scene Synthesis. [homepage]
- Scene Synthesis via Uncertainty-Driven Attribute Synchronization [pdf]
- Indoor Scene Generation from a Collection of Semantic-Segmented Depth Images [pdf]
- Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models [pdf]
If you find DiffuScene useful for your work please cite:
@inproceedings{tang2024diffuscene,
title={Diffuscene: Denoising diffusion models for generative indoor scene synthesis},
author={Tang, Jiapeng and Nie, Yinyu and Markhasin, Lev and Dai, Angela and Thies, Justus and Nie{\ss}ner, Matthias},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
Contact Jiapeng Tang for questions, comments and reporting bugs.
Most of the code is borrowed from ATISS. We thank for Despoina Paschalidou her great works and repos.