Undervalued Stock Scanner 2.0: From Spreadsheet Automation to Scalable Python Pipelines
This project rebuilds a spreadsheet-based stock screener into scalable Python ETL pipelines that applies financial logic and statistical methods to detect undervalued opportunities and monitor exit signals.
This new version of Undervalued Stock Scanner is a complete architectural shift from my original system. What began as financial spreadsheet modeling has evolved into modular Python pipelines that automate data ingestion, statistical processing, valuation screening, and exit monitoring.
Disclaimer: This project is for educational purposes only. The analysis, models, and methods used here are not financial advice. Investing carries risk, and any decisions based on this analysis are your responsibility. Always do your own research or consult with a qualified financial advisor before making investment decisions.
- Why Rebuild It?
- Re-Engineering the Pipeline
- Conceptual Framework Summary
- Explore the Dashboards
- Final Reflection
- 🔗 Related Projects
Why Rebuild It?
The original Undervalued Stock Scanner was built using Excel, Power Query, and VBA. I initially chose Excel because I wanted to use the Real-Time Data (RTD) function to stream live stock prices directly from Thinkorswim into my valuation models. At first, this seemed essential as I could get real-time price-related metrics like P/E, P/B, and P/FCF. However, in practice, I review screening results on the same day that I update financial data, during after-hours. Because the analysis is performed after market close, intraday price fluctuations don’t affect my decision-making process. End-of-day pricing is sufficient for how I evaluate screening results.
I continued to use this model for a year because it worked well for structured analysis and dashboarding. Excel was flexible. Power Query handled data transformation well. VBA automated repetitive data refresh tasks. However, data processing was tied to a desktop workflow: fragmented data update across separate sector workbooks, layered query dependencies, and logic embedded inside spreadsheet environments.
As a data analyst who’s always looking to streamline processes, I’m constantly searching for ways to make data workflows cleaner, more efficient, and easier to maintain. For this project, I wanted a more robust data processing method that offers:
- A clear separation between data processing and presentation
- Reproducible pipelines independent of UI tools
- Modular scripts instead of layered query chains
- Better statistical handling of outliers
- Centralized batch processing across sectors
- A system that could scale without becoming fragile
So I rebuilt the entire system in Python, all outside the constraints of Excel.
Re-Engineering the Pipeline
Instead of relying on fragmented workbook refresh cycles, the new system uses Python scripts that:
- Process raw sector CSV files and applies valuation logic in batch
- Monitor held positions for red flags
- Generate on-demand fundamental snapshots
Raw sector CSV exports are placed into predefined input folders. From there, the scripts automatically retrieve all relevant files, process them in batch, apply screening and statistical logic, and export Excel outputs into designated destination folders. Those outputs then serve as clean, structured data sources for the Power BI dashboards.
The process becomes: Drop files in → Run the scripts → Clean outputs generated → Refresh dashboard.
The scripts use Python’s os and glob libraries to automatically locate and retrieve all relevant files and export structured outputs into designated destination folders. The system maintains consistent naming conventions and directory organization. This design reduces manual error, improves reproducibility, and makes the workflow significantly easier to scale across sectors.
Most importantly, calculation logic is now explicit and centralized in code rather than distributed across interconnected spreadsheet formulas.
Conceptual Framework Summary
The scanner evaluates key fundamentals like P/FCF, P/B, P/E, ROE, ROA, and A/E. To make comparisons fair, it assesses stocks relative to industry peers in the same market cap groups rather than across the entire market. Outliers are filtered using quartile bounds, and z-scores standardize each metric to allow consistent comparison between stocks.
A stock passes the preliminary scan only if it meets all baseline conditions. Stocks that pass are then ranked using bonus signals.
A small change in business logic: REITs in the Real Estate sector are excluded. Metrics like AFFO, NAV, and NOI are more appropriate for evaluating REITs, but gathering these specialized metrics would require additional resources that I don’t have access to. For now, the scanner focuses on non-REIT companies, where standard fundamentals are meaningful and comparable.
One of the biggest upgrades is the Exit Signals Framework. The original project only identified undervalued stocks. Now, it also flags overvaluation, quality deterioration, and severe financial red flags for held positions. These are review triggers, not automatic sell signals. My goal was to design a system that helps me manage the full lifecycle of an investment decision.
For a deeper dive into the original methodology and screening logic, see What Makes a Stock Undervalued?
Explore the Dashboards
Below are the interactive dashboards that bring this system to life.
Undervalued Stock Scanner Dashboard
👉 Open the dashboard in new tab.
In this version, I added a drill-through page that allows users to right-click on any stock and navigate to a detailed profile view. This page displays all available fundamentals for the selected company, including its full company name, valuation and scoring signals. To bridge analytics with real-world research, the drill-through page also includes a direct link to the company’s profile on Yahoo Finance.
Exit Signals Dashboard
👉 Open the dashboard in new tab.
The Exit Signals dashboard uses conditional formatting to highlight valuation and quality signals that classify stocks into red flag types. Its simple layout, combined with dynamic filtering, allows for focused risk review.
Final Reflection
Rebuilding this project required more than a language change from Excel formulas and Power Query M to Python. It challenged me to rethink where logic should live, how data should flow, how automation should scale, and how analytics should be engineered.
It also meant moving on from a system I had invested significant time and effort building. The original Excel, Power Query, and VBA framework worked, and I was proud of it. But improvement sometimes requires letting go of what already works in order to build something more robust. The rebuilt system reflects how I approach data and workflow challenges. I actively look for ways to streamline processes, even when that means redesigning them from the ground up. For me, improving systems isn’t about chasing new tools; it is about prioritizing accuracy, consistency, and long-term maintainability.
🔗 Related Projects
Undervalued Stock Trade Tracking
I also built an Undervalued Stock Trade Tracking system that automates the entire process of tracking and analyzing trade history of the Undervalued Stock strategy. Its Python-based ETL pipeline automatically accesses the file system to locate and retrieve the latest Thinkorswim statements, processes new trade history, and updates a clean dataset used for real-time Excel reporting.
Feel free to check it out for more details on how I handle trade history, ETL, and real-time Excel reporting.
👉 See Undervalued Stock Trade Tracking
Undervalued Stock Scanner 1.0
The original Undervalued Stock Scanner used Excel, Power Query, VBA and Power BI to screen stocks based on financial valuation rules. It established the framework later rebuilt into the new version 2.0.
Explore it to learn more about the original concepts and legacy pipeline.