Improving residential energy efficiency is essential for sustainability, and Non-Intrusive Load Monitoring (NILM) is a promising technology that provides detailed insights into energy consumption without requiring individual appliance monitoring. However, creating accurate NILM models using publicly available datasets involves substantial challenges. This paper identifies critical pitfalls in dataset handling, event detection accuracy, and feature extraction, aspects often undiscussed in prior literature. We propose novel algorithms to rectify event timestamp inaccuracies and effectively extract transient signals associated with appliance state changes. We rigorously evaluate our pipeline on multiple public datasets, analyzing feature stability and assessing the impact of aggregated versus isolated data. The insights from our practical implementation and evaluation aim to assist researchers in overcoming common early-stage obstacles in developing robust NILM systems and enhancing their applicability in real-world scenarios.