Abstract Accurate prediction of stock market dynamics remains a formidable challenge due to the high noise, non-stationarity, and non-linearity inherent in financial time series data. Traditional econometric and standalone machine learning models often struggle to capture the complex, multi-modal dependencies that drive market movements. This paper proposes a novel Hybrid Deep Learning Framework (HDLF) that integrates a Convolutional Neural Network (CNN), a Long Short-Term Memory (LSTM) network, and a Self-Attention mechanism to achieve enhanced stock price prediction and directional trend classification. The HDLF leverages CNNs to extract robust short-term, spatial features from technical indicators, LSTMs to model long-term temporal dependencies in price sequences, and the Attention mechanism to dynamically weight the most relevant past information. By incorporating diverse feature sets—including high-frequency OHLCV (Open, High, Low, Close, Volume) data, derivative technical indicators, and text-based market sentiment data—the hybrid architecture consistently outperforms traditional benchmark models (ARIMA, standalone LSTM, and Random Forest) across various market conditions. Evaluation using metrics like Root Mean Square Error (RMSE) for regression and the F1-Score and Sharpe Ratio for classification demonstrates a significant improvement in predictive accuracy and financial profitability. The findings affirm the superiority of hybrid AI approaches in financial forecasting, offering a resilient and comprehensive tool for investors and quantitative analysts. Keywords: Stock Market Prediction, Hybrid AI, Deep Learning, Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Attention Mechanism, Time Series Analysis, Trend Classification, Financial Forecasting 1.Introduction 1.1 Background and Motivation The stock market, a cornerstone of global finance, is often modeled as a complex adaptive system influenced by economic policies, geopolitical events, company fundamentals, and collective investor psychology. The quest to accurately predict stock price movements—a problem that violates the assumptions of the Efficient Market Hypothesis (EMH)—has led to the development of increasingly sophisticated quantitative methods. Historically, prediction attempts relied on two primary schools of thought: fundamental analysis, which focuses on intrinsic value derived from financial statements, and technical analysis, which forecasts prices based on historical trading data and volume. The advent of high-frequency trading and vast data availability has catalyzed a paradigm shift toward algorithmic modeling using Artificial Intelligence (AI) and Machine Learning (ML). While early ML models like Support Vector Machines (SVM) and Random Forests (RF) demonstrated utility by capturing non-linear relationships, they often failed to effectively process the sequential nature of time series data. Deep Learning (DL) architectures, particularly Recurrent Neural Networks (RNNs) and their variants like LSTMs, addressed temporal dependencies but sometimes lacked the ability to effectively filter high-frequency noise and extract robust local patterns. The primary motivation of this research is to address the inherent limitations of standalone models by proposing and evaluating a robust hybrid architecture. By synergizing the strengths of different DL components, the goal is to create a model that is resilient to market noise, capable of handling multi-modal data inputs, and superior in both price forecasting (regression) and directional movement prediction (classification). 2.Problem Statement Stock market data presents several critical challenges for modeling: High Volatility and Non-Stationarity: Market characteristics change rapidly over time, leading to concept drift, where models trained on past data quickly become obsolete. Multi-Dimensional Dependency: Price movements are influenced not only by past prices (temporal features) but also by simultaneous events, related stocks, volume shifts, and unstructured sentiment data (spatial/multi-modal features). Low Signal-to-Noise Ratio: The true predictive signal is often masked by random market fluctuations and external, unpredictable events. Standalone models (e.g., ARIMA for linearity, LSTM for sequence) cannot simultaneously and optimally address all three challenges. Therefore, the problem is to design a Hybrid Deep Learning Framework (HDLF) capable of extracting both fine-grained local patterns (spatial features) and long-range sequential trends (temporal features) from a heterogeneous data stream, leading to demonstrable enhancements in prediction accuracy and trend classification capability. 3.Methodology 3.1.Data Acquisition and Feature Engineering The reliability of a predictive model hinges on the quality and breadth of its input data. This study utilizes a multi-modal feature set categorized into three groups: 3.2.Historical Price Data (OHLCV) Daily and intra-day (e.g., 30-minute) Open, High, Low, Close, and Volume data are collected for a major stock index (e.g., S&P 500 constituents) over a period spanning at least 10 years to ensure adequate data volume covering various economic cycles. 3.3.Technical Indicators (TI) TI's serve as critical features, quantifying momentum, volatility, and trend. The chosen set includes: Trend Indicators: Moving Average Convergence Divergence (MACD), Simple Moving Averages (SMA-10, SMA-50). Momentum Indicators: Relative Strength Index (RSI), Stochastic Oscillator. Volatility Indicators: Bollinger Bands (, Bandwidth), Average True Range (ATR). 3.4.Market Sentiment Data Unstructured data is crucial for capturing sudden market shifts. Financial news headlines and Twitter (X) data related to the selected stocks are collected. A pre-trained Natural Language Processing (NLP) model, such as FinBERT, is used to assign a sentiment score (ranging from for highly negative to for highly positive) to each trading day. This score is then integrated as a numerical feature. 3.5.Data Preprocessing Robust preprocessing is essential for deep learning: Normalization: All numerical features (OHLCV, TI, Sentiment) are scaled using Min-Max normalization to the range 0,1. This prevents features with larger magnitudes (like Volume or Price) from dominating the smaller-scale features (like RSI or Sentiment). Sequence Formatting: The data is transformed into a supervised learning problem format.The output targets are: Regression Target: The normalized closing price at . Classification Target: The directional movement at , categorized as: Class 1 (Up): Class 0 (Stable/Neutral): Class -1 (Down): Closet+1≤Closet−δWhere δ is a predefined threshold (e.g., 0.5% of the price) to filter noise. Train-Test Split: The dataset is split temporally to maintain the integrity of time series ordering: for training, for validation, and the final for testing. 3.6.Proposed Hybrid Deep Learning Framework (HDLF) The proposed HDLF model combines three core components in a sequential manner, as illustrated in the conceptual architecture below. 3.7.Model Architecture The architecture is structured to leverage the specific strengths of each component: CNN Layer (Feature Extraction): The input multi-feature sequence is first passed through a 1D Convolutional layer. This layer acts as a local pattern detector, applying multiple filters (32 filters of size 3) to extract short-term, invariant features like micro-trends and sudden volatility spikes across the time window . A Max-Pooling layer then reduces the dimensionality while retaining the most salient features. LSTM Layer (Temporal Modeling): The extracted features from the CNN layer are passed to the LSTM network. The LSTM is responsible for learning the long-range dependencies and the overall sequential context of the market. It processes the filtered time series, effectively mitigating the noise filtered out by the CNN. Self-Attention Layer (Dynamic Weighting): The output of the LSTM hidden states is fed into a Self-Attention mechanism. This layer computes alignment scores between all previous hidden states and the current state, producing a context vector that dynamically assigns higher weights to the most relevant historical time steps (e.g., days with extreme sentiment or volume). This significantly enhances the model's ability to focus on key market turning points. Output Layer (Prediction): The context vector from the Attention mechanism is flattened and passed through two parallel Dense (Fully Connected) layers: one with a linear activation for the Regression Task (Price Prediction), and another with a Softmax activation for the Classification Task (Trend Prediction). 3.8.Evaluation Metrics Given the dual nature of the model (regression and classification), a comprehensive set of metrics is employed. Regression Metrics (Forecasting Price) These measure the magnitude of the prediction error. Root Mean Square Error (RMSE): Standard metric, penalizes large errors. Mean Absolute Error (MAE): Represents the average magnitude of the errors, in the original units. Coefficient of Determination (): Measures the proportion of the variance in the dependent variable that is predictable from the independent variables. Higher is better. Classification Metrics (Predicting Trend) These are crucial for assessing the trading utility of the model. Accuracy: Overall proportion of correctly predicted classes. Precision, Recall, and F1-Score: These metrics are essential, particularly in finance where false positives (predicting 'Up' when it goes 'Down') are highly costly. F1-Score is the harmonic mean of Precision and Recall. Area Under the ROC Curve (AUC-ROC): Measures the model's ability to distinguish between classes. Financial Performance Metrics (Trading Simulation) To assess practical
Kumar et al. (Thu,) studied this question.