Hannah Schieber, M. Sc., PhD Student @ FAU
I am a PhD student at the
Friedrich-Alexander University Erlangen-Nürnberg (FAU), Germany. I am currently researching
the topic of neural 3D content creation and guidance in 3D using extend reality.
Moreover, I am a research assistant at Technical University of Munich (TUM),
Germany, assisting as tutor
at several courses like:
Advanced
Topics in 3D Computer Vision
or Modern
Computer Vision Methods
I am currently a visiting PhD student at the University of Otago, New Zealand. There I am
researching the creation of 3D scenes in the Visual Computing Group under the supervision of
Prof. Stefanie Zollmann.
Prior that, I received my bachelor’s degree from HS Aalen, and my master’s degree from TH Ingolstadt, both in computer science. During my studies, I gained experience in the industry at MAPAL Dr. Kress KG, c-Com GmbH, Carl Zeiss AG and AUDI AG.
I am interested in computer vision, extended reality and in general in many things in life. Besides being passionate about my PhD studies I like to cycling and bouldering.
Interested in a research coperation? Feel free to contact me any time via e-mail:
Contact:
hannah dot schieber @ tum dot de /
Google Scholar
/
Twitter /
Github
Abstract Arxiv Website Dataset Toolbox
Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches, research is shifting towards category-level pose estimation for practical applications. Current category-level datasets, however, fall short in annotation quality and pose variety. Addressing this, we introduce HouseCat6D, a new category-level 6D pose dataset. It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household categories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive viewpoint and occlusion coverage, 5) a checkerboard-free environment, and 6) dense 6D parallel-jaw robotic grasp annotations. Additionally, we present benchmark results for leading category-level pose estimation networks.
Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical medical or industrial settings, continuous and markerless tracking of individual parts is essential to visualize instructions superimposed on or next to the target object parts. In this regard, occlusions by the user's hand or other objects and the complexity of different assembly states complicate robust and real-time markerless multi-object tracking. To address this problem, we present Graph-based Object Tracking (GBOT), a novel graph-based single-view RGB-D tracking approach. The real-time markerless multi-object tracking is initialized via 6D pose estimation and updates the graph-based assembly poses. The tracking through various assembly states is achieved by our novel multi-state assembly graph. We update the multi-state assembly graph by utilizing the relative poses of the individual assembly parts. Linking the individual objects in this graph enables more robust object tracking during the assembly process. For evaluation, we introduce a synthetic dataset of publicly available and 3D printable assembly assets as a benchmark for future work. Quantitative experiments in synthetic data and further qualitative study in real test data show that GBOT can outperform existing work towards enabling context-aware augmented reality assembly guidance. Dataset and code will be made publically available.
Search tasks can be challenging for blind or visually impaired people. To determine an object's location and to navigate there, they often rely on the limited sensory capabilities of a white cane, search haptically, or ask for help. We introduce MR-Sense, a mixed reality assistant to support search and navigation tasks. The system is designed in a participatory fashion and utilizes sensory data of a standalone mixed reality head-mounted display to perform deep learning-driven object recognition and environment mapping. The user is supported in object search tasks via spatially mapped audio and vibrotactile feedback. We conducted a preliminary user study including ten blind or visually impaired participants and a final user evaluation with thirteen blind or visually impaired participants. The final study reveals that MR-Sense alone cannot replace the cane but provides a valuable addition in terms of usability and task load. We further propose a standardized evaluation setup for replicable studies and highlight relevant potentials and challenges fostering future work towards employing technology in accessibility.
Abstract Elsevier CVIU SSRN Preprint Website
Deep learning-based object recognition, 6D pose estimation, and semantic scene understanding requires a large amount of training data to achieve generalization. Time-consuming annotation processes, privacy, and security aspects lead to a scarcity of real-world datasets. To overcome this lack of data, synthetic data generation has been proposed, including multiple facets in the area of domain randomization to extend the data distribution. The objective of this review is to identify methods applied for synthetic data generation aiming to improve 6D pose estimation, object recognition, and semantic scene understanding in indoor scenarios. We further review methods used to extend the data distribution and discuss best practices to bridge the gap between synthetic and real-world data.
We present a modular approach allowing the flexible exchange of the individual part, i.e. the camera or SLAM algorithm. This work presents results from a pilot study involving five participants to gain an impression of what kind of visualization type would be preferred and whether the point cloud overlay would assist the user in recognizing changes in the surroundings. The point cloud overlay enabled the participants to perceive more changes. The pilot study revealed that 60% of the participants showed a preference for the point cloud overlay over the pure mesh representation.
Human cognition relies on embodiment as a fundamental mechanism. Virtual avatars allow users to experience the adaptation, control, and perceptual illusion of alternative bodies. Although virtual bodies have medical applications in motor rehabilitation and therapeutic interventions, their potential for learning anatomy and medical communication remains underexplored. For learners and patients, anatomy, procedures, and medical imaging can be abstract and difficult to grasp. Experiencing anatomies, injuries, and treatments virtually through one's own body could be a valuable tool for fostering understanding. This work investigates the impact of avatars displaying anatomy and injuries suitable for such medical simulations. We ran a user study utilizing a skeleton avatar and virtual injuries, comparing to a healthy human avatar as a baseline. We evaluate the influence on embodiment, well-being, and presence with self-report questionnaires, as well as motor performance via an arm movement task. Our results show that while both anatomical representation and injuries increase feelings of eeriness, there are no negative effects on embodiment, well-being, presence, or motor performance. These findings suggest that virtual representations of anatomy and injuries are suitable for medical visualizations targeting learning or communication without significantly affecting users' mental state or physical control within the simulation.
Objective: In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. Methods: Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. Conclusion: The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. Significance: The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.
Error-free surgical procedures are crucial for a patient's health. However, with the increasing complexity and variety of surgical instruments, it is difficult for clinical staff to acquire detailed assembly and usage knowledge leading to errors in process and preparation steps. Yet, the gold standard in retrieving necessary information when problems occur is to get the paperbased manual. Reading through the necessary instructions is time-consuming and decreases care quality. We propose ARTFM, a process integrated manual, highlighting the correct parts needed, their location, and step-by-step instructions to combine the instrument using an augmented reality head-mounted display.
Persons affected by blindness or visual impairments are challenged by spatially understanding unfamiliar environments. To obtain such understanding, they have to sense their environment closely and carefully. Especially objects outside the sensing area of analog assistive devices, such as a white cane, are simply not perceived and can be the cause of collisions. This project proposes a mixed reality guidance system that aims at preventing such problems. We use object detection and the 3D sensing capabilities of a mixed reality head mounted device to inform users about their spatial surroundings.
Robust environment perception for autonomous vehicles is a tremendous challenge, which makes a diverse sensor set with e.g. camera, lidar and radar crucial. In the process of understanding the recorded sensor data, 3D semantic segmentation plays an important role. Therefore, this work presents a pyramid-based deep fusion architecture for lidar and camera to improve 3D semantic segmentation of traffic scenes. Individual sensor backbones extract feature maps of camera images and lidar point clouds. A novel Pyramid Fusion Backbone fuses these feature maps at different scales and combines the multimodal features in a feature pyramid to compute valuable multimodal, multi-scale features. The Pyramid Fusion Head aggregates these pyramid features and further refines them in a late fusion step, incorporating the final features of the sensor backbones. The approach is evaluated on two challenging outdoor datasets and different fusion strategies and setups are investigated. It outperforms recent range view based lidar approaches as well as all so far proposed fusion strategies and architectures.
We utilize Gaussian Fourier features to estimate extrinsic camera parameters and dynamically predict varying intrinsic camera parameters through the supervision of the projection error. Our approach outperforms existing joint optimization methods on LLFF and BLEFF. In addition to these existing datasets, we introduce a new dataset called iFF with varying intrinsic camera parameters. NeRFtrinsic Four is a step forward in joint optimization NeRF-based view synthesis and enables more realistic and flexible rendering in real-world scenarios with varying camera parameters.
Dynamic reconstruction with neural radiance fields (NeRF) requires accurate camera poses. These are often hard to retrieve with existing structure-from-motion (SfM) pipelines as both camera and scene content can change. We propose DynaMoN that leverages simultaneous localization and mapping (SLAM) jointly with motion masking to handle dynamic scene content. Our robust SLAM-based tracking module significantly accelerates the training process of the dynamic NeRF while improving the quality of synthesized views at the same time. Extensive experimental validation on TUM RGB-D, BONN RGB-D Dynamic and the DyCheck's iPhone dataset, three real-world datasets, shows the advantages of DynaMoN both for camera pose estimation and novel view synthesis.
In medical and industrial domains, providing guidance for assembly processes is critical to ensure efficiency and safety. Errors in assembly can lead to significant consequences such as extended surgery times, and prolonged manufacturing or maintenance times in industry. Assembly scenarios can benefit from in-situ AR visualization to provide guidance, reduce assembly times and minimize errors. To enable in-situ visualization 6D pose estimation can be leveraged. Existing 6D pose estimation techniques primarily focus on individual objects and static captures. However, assembly scenarios have various dynamics including occlusion during assembly and dynamics in the assembly objects appearance. Existing work, combining object detection/ 6D pose estimation and assembly state detection focuses either on pure deep learning-based approaches, or limit the assembly state detection to building blocks. To address the challenges of 6D pose estimation in combination with assembly state detection, our approach ASDF builds upon the strengths of YOLOv8, a real-time capable object detection framework. We extend this framework, refine the object pose and fuse pose knowledge with network-detected pose information. Utilizing our late fusion in our Pose2State module results in refined 6D pose estimation and assembly state detection. By combining both pose and state information, our Pose2State module predicts the final assembly state with precision. Our evaluation on our \ac{asdf} dataset shows that our Pose2State module leads to an improved assembly state detection and that the improvement of the assembly state further leads to a more robust 6D pose estimation. Moreover, on the GBOT dataset, we outperform the pure deep learning-based network, and even outperform the hybrid and pure tracking-based approaches.