Nassirtoussi, Arman Khadjeh (2015) A multi-layer dimension reduction algorithm for text mining of news in forex / Arman Khadjeh Nassirtoussi. PhD thesis, University of Malaya.
| PDF (Full Text) Download (2765Kb) | Preview |
Abstract
Information Explosion has caused the demand for customized text-mining in every imaginable area to sky-rocket. Text mining is needed in many areas, a few of which are: search engine development, spam-filtering and text-summarization. Every context requires its own customized text mining algorithms in order to achieve best results. The specific context of this research is market prediction for the foreign exchange market. The objective is to utilize news-headlines to predict market-movements 1 to 3 hours after news release. The literature on recent research efforts in behavioral economics confirms that investors’ aggregate behavioral reactions to information released in the news can drive prices up or down. This theoretical basis constitutes the economic foundation of this investigation. After economic comprehension of the problem at hand; available systems in the literature which operate in a comparable context are reviewed. The major finding of this review is that context-specific text mining algorithms are lacking. The main underlying text-mining challenge that seems to deserve immediate attention is the sparse and high dimensional nature of the feature-space. Therefore, this work produces a multi-layer dimension reduction algorithm to respond to this need. The algorithm tackles a different root cause of the problem at each layer. The first layer is termed the Semantic Abstraction Layer and addresses the problem of co-reference in text mining that is contributing to sparsity. Co-reference occurs when two or more words in a text corpus refer to the same concept. This work produces a custom approach by the name of Heuristic-Hypernyms Modeling which creates a way to recognize words with the same parent-word to be regarded as one entity. As a result, prediction accuracy increases significantly at this layer which is attributed to appropriate noise-reduction from the feature-space. The second layer is termed Sentiment Integration Layer, which integrates sentiment analysis capability into the algorithm by proposing a sentiment weight by the name of SumScore that reflects investors’ sentiment. This layer reduces the dimensions by eliminating those that are of zero value in terms of sentiment and thereby improves prediction accuracy. The third layer encompasses a dynamic model creation algorithm, termed Synchronous Targeted Feature Reduction (STFR). It is suitable for the challenge at hand whereby the mining of a stream of text is concerned. It updates the models with the most recent information available and, more importantly, it ensures that the dimensions are reduced to a number that is many times smaller. The algorithm and each of its layers are extensively evaluated using real market data and news content across multiple years and have proven to be solid and superior to any other comparable solution. On top of a well-rounded multifaceted algorithm, this work contributes a much needed research framework for this context with a test-bed of data that must make future research endeavors more convenient. The produced algorithm is scalable and its modular design allows improvement in each of its layers in future research.
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | Thesis (Ph.D.) -- Faculty of Computer Science and Information Technology, University of Malaya, 2015 |
Uncontrolled Keywords: | Multi-layer; Dimension; Reduction; Algorithm; Text mining; News in forex |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Divisions: | Faculty of Computer Science & Information Technology |
Depositing User: | Mrs Nur Aqilah Paing |
Date Deposited: | 02 Oct 2015 10:20 |
Last Modified: | 02 Oct 2015 10:20 |
URI: | http://studentsrepo.um.edu.my/id/eprint/5937 |
Actions (For repository staff only : Login required)
View Item |