An indoor scene structural estimation system and an estimation method based on deep learning network are provided. An indoor scene structural estimation system based on deep learning network includes a 2D encoder, a 2D plane decoder, a 2D edge decoder, a 2D corner decoder, and a 3D encoder. The 2D encoder receives an input image and encodes the input image. The 2D plane decoder is connected to the 2D encoder, decodes the encoded input image, and generates a 2D plane segment layout image. The 2D plane decoder is connected to the 2D encoder, decodes the encoded input image, and generates a 2D plane segment layout image. The 2D corner decoder is connected to 2D encoder, decodes the encoded input image, and generates a 2D corner layout image. |