Hannah Schieber, M. Sc., PhD Student @ FAU



I am a PhD student at the Friedrich-Alexander University Erlangen-Nürnberg (FAU), Germany. I am currently researching the topic of neural 3D content creation and guidance in 3D using extend reality.
Moreover, I am a research assistant at Technical University of Munich (TUM), Germany, assisting as tutor at several courses like: Advanced Topics in 3D Computer Vision or Modern Computer Vision Methods
I am currently a visiting PhD student at the University of Otago, New Zealand. There I am researching the creation of 3D scenes in the Visual Computing Group under the supervision of Prof. Stefanie Zollmann.

Prior that, I received my bachelor’s degree from HS Aalen, and my master’s degree from TH Ingolstadt, both in computer science. During my studies, I gained experience in the industry at MAPAL Dr. Kress KG, c-Com GmbH, Carl Zeiss AG and AUDI AG.

I am interested in computer vision, extended reality and in general in many things in life. Besides being passionate about my PhD studies I like to cycling and bouldering.

Interested in a research coperation? Feel free to contact me any time via e-mail:

Contact:
hannah dot schieber @ tum dot de  /  Google Scholar  /  Twitter  /  Github

News

  • [04/2024] One paper accepted as Highlight at CVPR 2024 (shared first author)
  • [03/2024] One paper and one poster accepted at IEEE VR 2024 (second author)
  • [02/2024] Research Visit at the University of Otago

Information

Projects: Below you can find published research projects and at the bottom of the page some preprints.
Services:
Conferences
  I was reviewer at IEEE VR, IEEE ISMAR and the WICV'23 workshop at ICCV.
Journals
  For journals I was a reviewer at Springer IJCARS.


Research



2024




HouseCat6D--A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios
H. Jung*, G. Zhai*, S.-C. Wu*, P. Ruhkamp*, H. Schieber*, P. Wang, G. Rizzoli, H. Zhao, S. D. Meier, D. Roth, N. Navab and B. Busam, *denotes equal contribution
CVPR, Highlight, 2024

Arxiv Website Dataset Toolbox

Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches, research is shifting towards category-level pose estimation for practical applications. Current category-level datasets, however, fall short in annotation quality and pose variety. Addressing this, we introduce HouseCat6D, a new category-level 6D pose dataset. It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household categories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive viewpoint and occlusion coverage, 5) a checkerboard-free environment, and 6) dense 6D parallel-jaw robotic grasp annotations. Additionally, we present benchmark results for leading category-level pose estimation networks.



GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance
S. Li, H. Schieber , B. Egger, J. Kreimeier and D. Roth
IEEE VR 2024, Conference Paper

Arxiv GitHub

Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical medical or industrial settings, continuous and markerless tracking of individual parts is essential to visualize instructions superimposed on or next to the target object parts. In this regard, occlusions by the user's hand or other objects and the complexity of different assembly states complicate robust and real-time markerless multi-object tracking. To address this problem, we present Graph-based Object Tracking (GBOT), a novel graph-based single-view RGB-D tracking approach. The real-time markerless multi-object tracking is initialized via 6D pose estimation and updates the graph-based assembly poses. The tracking through various assembly states is achieved by our novel multi-state assembly graph. We update the multi-state assembly graph by utilizing the relative poses of the individual assembly parts. Linking the individual objects in this graph enables more robust object tracking during the assembly process. For evaluation, we introduce a synthetic dataset of publicly available and 3D printable assembly assets as a benchmark for future work. Quantitative experiments in synthetic data and further qualitative study in real test data show that GBOT can outperform existing work towards enabling context-aware augmented reality assembly guidance. Dataset and code will be made publically available.



Towards Continuous Patient Care with Remote Guided VR-Therapy (Poster)
J. Kreimeier, H. Schieber , N. Lewis, M. Smietana, J. Reithmeier, V. Cnejevici, P. Prasad, A. Eid, M. Maier, D. Roth
IEEE VR 2024, Poster



MR-Sense: A Mixed Reality Environment Search Assistant for Blind and Visually Impaired People
H. Schieber, C. Kleinbeck, L. Theelke, M. Kraft, J. Kreimeier and D. Roth
IEEE AIxVR, 2024

IEEE GitHub/Website

Search tasks can be challenging for blind or visually impaired people. To determine an object's location and to navigate there, they often rely on the limited sensory capabilities of a white cane, search haptically, or ask for help. We introduce MR-Sense, a mixed reality assistant to support search and navigation tasks. The system is designed in a participatory fashion and utilizes sensory data of a standalone mixed reality head-mounted display to perform deep learning-driven object recognition and environment mapping. The user is supported in object search tasks via spatially mapped audio and vibrotactile feedback. We conducted a preliminary user study including ten blind or visually impaired participants and a final user evaluation with thirteen blind or visually impaired participants. The final study reveals that MR-Sense alone cannot replace the cane but provides a valuable addition in terms of usability and task load. We further propose a standardized evaluation setup for replicable studies and highlight relevant potentials and challenges fostering future work towards employing technology in accessibility.



Indoor Synthetic Data Generation: A Systematic Review
H. Schieber, K. C. Demir, C. Kleinbeck, S. H. Yang and D. Roth
Computer Vision and Image Understanding (CVIU), 2024

Elsevier CVIU SSRN Preprint Website

Deep learning-based object recognition, 6D pose estimation, and semantic scene understanding requires a large amount of training data to achieve generalization. Time-consuming annotation processes, privacy, and security aspects lead to a scarcity of real-world datasets. To overcome this lack of data, synthetic data generation has been proposed, including multiple facets in the area of domain randomization to extend the data distribution. The objective of this review is to identify methods applied for synthetic data generation aiming to improve 6D pose estimation, object recognition, and semantic scene understanding in indoor scenarios. We further review methods used to extend the data distribution and discuss best practices to bridge the gap between synthetic and real-world data.



2023


A Modular Approach for 3D Reconstruction with Point Cloud Overlay (Poster)
H. Schieber., F. Schmid, M. UI-Hassan, S. Zollmann and D. Roth
Poster Session at ISMAR, 2023

IEEE

We present a modular approach allowing the flexible exchange of the individual part, i.e. the camera or SLAM algorithm. This work presents results from a pilot study involving five participants to gain an impression of what kind of visualization type would be preferred and whether the point cloud overlay would assist the user in recognizing changes in the surroundings. The point cloud overlay enabled the participants to perceive more changes. The pilot study revealed that 60% of the participants showed a preference for the point cloud overlay over the pure mesh representation.


Injured Avatars: The Impact of Embodied Anatomies and Virtual Injuries on Well-being and Performance
C. Kleinbeck, H. Schieber, J. Kreimeier, A. Martin-Gomez, M. Unberath and D. Roth
IEEE Transactions on Visualization and Computer Graphics, 2023

IEEE TVCG

Human cognition relies on embodiment as a fundamental mechanism. Virtual avatars allow users to experience the adaptation, control, and perceptual illusion of alternative bodies. Although virtual bodies have medical applications in motor rehabilitation and therapeutic interventions, their potential for learning anatomy and medical communication remains underexplored. For learners and patients, anatomy, procedures, and medical imaging can be abstract and difficult to grasp. Experiencing anatomies, injuries, and treatments virtually through one's own body could be a valuable tool for fostering understanding. This work investigates the impact of avatars displaying anatomy and injuries suitable for such medical simulations. We ran a user study utilizing a skeleton avatar and virtual injuries, comparing to a healthy human avatar as a baseline. We evaluate the influence on embodiment, well-being, and presence with self-report questionnaires, as well as motor performance via an arm movement task. Our results show that while both anatomical representation and injuries increase feelings of eeriness, there are no negative effects on embodiment, well-being, presence, or motor performance. These findings suggest that virtual representations of anatomy and injuries are suitable for medical visualizations targeting learning or communication without significantly affecting users' mental state or physical control within the simulation.


Deep Learning in Surgical Workflow Analysis: A Review of Phase and Step Recognition
Demir, K. C., Schieber, H., Weise, T. Roth, D., May, M., Maier, A., & Yang, S. H.
IEEE Journal of Biomedical and Health Informatics, 2023

IEEE JBHI

Objective: In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. Methods: Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. Conclusion: The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. Significance: The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.



2022


ARTFM: Augmented Reality Visualization of Tool Functionality Manuals in Operating Rooms. (Poster)
Kleinbeck, C., Schieber, H., Andress, S., Krautz, C., & Roth, D.
2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) (pp. 736-737). IEEE.

IEEE

Error-free surgical procedures are crucial for a patient's health. However, with the increasing complexity and variety of surgical instruments, it is difficult for clinical staff to acquire detailed assembly and usage knowledge leading to errors in process and preparation steps. Yet, the gold standard in retrieving necessary information when problems occur is to get the paperbased manual. Reading through the necessary instructions is time-consuming and decreases care quality. We propose ARTFM, a process integrated manual, highlighting the correct parts needed, their location, and step-by-step instructions to combine the instrument using an augmented reality head-mounted display.


A Mixed Reality Guidance System for Blind and Visually Impaired People (Poster)
Schieber, H., Kleinbeck, C., Pradel, C., Theelke, L., & Roth, D.
2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) (pp. 726-727). IEEE., 2022

[IEEE]

Persons affected by blindness or visual impairments are challenged by spatially understanding unfamiliar environments. To obtain such understanding, they have to sense their environment closely and carefully. Especially objects outside the sensing area of analog assistive devices, such as a white cane, are simply not perceived and can be the cause of collisions. This project proposes a mixed reality guidance system that aims at preventing such problems. We use object detection and the 3D sensing capabilities of a mixed reality head mounted device to inform users about their spatial surroundings.


Deep Sensor Fusion with Pyramid Fusion Networks for 3D Semantic Segmentation
H. Schieber*, F. Duerr*, T. Schoen and J. Beyerer, *denotes equal contribution
Intelligent Vehicles Symposium (IV), Aachen, Germany, 2022

IEEE Arxiv Website

Robust environment perception for autonomous vehicles is a tremendous challenge, which makes a diverse sensor set with e.g. camera, lidar and radar crucial. In the process of understanding the recorded sensor data, 3D semantic segmentation plays an important role. Therefore, this work presents a pyramid-based deep fusion architecture for lidar and camera to improve 3D semantic segmentation of traffic scenes. Individual sensor backbones extract feature maps of camera images and lidar point clouds. A novel Pyramid Fusion Backbone fuses these feature maps at different scales and combines the multimodal features in a feature pyramid to compute valuable multimodal, multi-scale features. The Pyramid Fusion Head aggregates these pyramid features and further refines them in a late fusion step, incorporating the final features of the sensor backbones. The approach is evaluated on two challenging outdoor datasets and different fusion strategies and setups are investigated. It outperforms recent range view based lidar approaches as well as all so far proposed fusion strategies and architectures.



Preprints


NeRFtrinsic Four: An End-To-End Trainable NeRF Jointly Optimizing Diverse Intrinsic and Extrinsic Camera Parameters
H. Schieber, F. Deuser, B. Egger, N. Oswald and D. Roth
2023

Arxiv GitHub

We utilize Gaussian Fourier features to estimate extrinsic camera parameters and dynamically predict varying intrinsic camera parameters through the supervision of the projection error. Our approach outperforms existing joint optimization methods on LLFF and BLEFF. In addition to these existing datasets, we introduce a new dataset called iFF with varying intrinsic camera parameters. NeRFtrinsic Four is a step forward in joint optimization NeRF-based view synthesis and enables more realistic and flexible rendering in real-world scenarios with varying camera parameters.


V2: DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic Neural Radiance Fields
V1: DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF

N. Schischka*, H. Schieber*, M. A. Karaoglu*, M. Görgülü*, F. Grötzner, A. Ladikos, D. Roth, N. Navab and B. Busam, *denotes equal contribution
2024

Arxiv GitHub/Website

Dynamic reconstruction with neural radiance fields (NeRF) requires accurate camera poses. These are often hard to retrieve with existing structure-from-motion (SfM) pipelines as both camera and scene content can change. We propose DynaMoN that leverages simultaneous localization and mapping (SLAM) jointly with motion masking to handle dynamic scene content. Our robust SLAM-based tracking module significantly accelerates the training process of the dynamic NeRF while improving the quality of synthesized views at the same time. Extensive experimental validation on TUM RGB-D, BONN RGB-D Dynamic and the DyCheck's iPhone dataset, three real-world datasets, shows the advantages of DynaMoN both for camera pose estimation and novel view synthesis.



ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation
H. Schieber, S. Li, N. Correl, P. Beckerle, J. Kreimeier, and D. Roth
2024

Arxiv

In medical and industrial domains, providing guidance for assembly processes is critical to ensure efficiency and safety. Errors in assembly can lead to significant consequences such as extended surgery times, and prolonged manufacturing or maintenance times in industry. Assembly scenarios can benefit from in-situ AR visualization to provide guidance, reduce assembly times and minimize errors. To enable in-situ visualization 6D pose estimation can be leveraged. Existing 6D pose estimation techniques primarily focus on individual objects and static captures. However, assembly scenarios have various dynamics including occlusion during assembly and dynamics in the assembly objects appearance. Existing work, combining object detection/ 6D pose estimation and assembly state detection focuses either on pure deep learning-based approaches, or limit the assembly state detection to building blocks. To address the challenges of 6D pose estimation in combination with assembly state detection, our approach ASDF builds upon the strengths of YOLOv8, a real-time capable object detection framework. We extend this framework, refine the object pose and fuse pose knowledge with network-detected pose information. Utilizing our late fusion in our Pose2State module results in refined 6D pose estimation and assembly state detection. By combining both pose and state information, our Pose2State module predicts the final assembly state with precision. Our evaluation on our \ac{asdf} dataset shows that our Pose2State module leads to an improved assembly state detection and that the improvement of the assembly state further leads to a more robust 6D pose estimation. Moreover, on the GBOT dataset, we outperform the pure deep learning-based network, and even outperform the hybrid and pure tracking-based approaches.


Copyright @ Hannah Schieber | This is my personal webpage.