Call us Today ! +918886268863 | [email protected]

Automatic Summarizing the News from Inform.kz by Using Natural Language Processing Tools

Automatic Summarizing the News from Inform.kz by Using Natural Language Processing Tools

Introduction:

Per each second there are under 3 Million messages are sent and nearly 2 Billion sites are active on the web and there are around 2.5 Quintilian bytes of information is generated every day [1] [2]. It refers to enormous amount of information that is spread throughout the web, which can be in different forms like images, videos, audios or texts. This work specifically focuses on text formats. For instance, it is impossible for an individual to extract the conception from all of texts referred to one problem as the data are huge and mostly contain much more unnecessary information than the main point of the problem discussed in the text. Therefore there is a need to summarize such data in a way that it contains only relevant and important information in a quick way.

Abstract:

The rapid rise of the information on the web brought up new problems of data access and processing. Therefore there is a need for tools that will help to overcome the problem of management and handling theBig Data in a quick manner. The primary goal of this work is to propose an efficient method for automatic text summarization by using Natural Language Processing (NLP) and Machine Learning (ML) techniques. This research introduces an abrupt, easily understandable and uncomplicated implementation of this method via overusing Python programming language. Efficient performance is necessary in web search tasks where an enormous of unstructured data need to be summarized very quickly. The novelty of the work is that text summarization is implemented on Kazakh texts. Extractive summarization uses new, keywords focused, approach. Contribution of the work is manually created stop words used for text summarization specifically for Kazakh language and dataset constructed by scraping news from country’s largest international news portal www.inform.kz. State-of-the-art results of the work show that it is possible to implement automatic text summarization for Kazakh language.

Existing work:

Indeed there are a lot of works done on this topic as problem of big data and summarization of data is now one of the significant questions of modern research. For instance, author of the work called” Unsupervised Text Summarization using Sentence Embedding” proposes an approach of text summarization for short texts. The process includes a number of operations for data preparation and then for data processing for summarization. The approach is structured very well and results show good performance.

Disadvantage:

However, its limitation is that this approach works for English, French, Danish language texts and implementation of the method is complicated. As it uses skip-thought vectors the process oftraining the model takes a long time (from 2 to 7 days).

Proposed work:

The primary goal of this work is to propose an efficient method for automatic text summarization by using Natural Language Processing (NLP) and Machine Learning (ML) techniques. This research introduces an abrupt, easily understandable and uncomplicated implementation of this method via overusingPython programming language. Efficient performance is necessary in web search tasks where an enormous of unstructured data need to be summarized very quickly. The novelty of the work is that text summarization is implemented on Kazakh texts. Extractive summarization uses new, keywords focused, approach. Contribution of the work is manually created stop words used for text summarization specifically for Kazakh language and dataset constructed by scraping news from country’s largest international news portal www.inform.kz.

Advantage:

It is possible to implement automatic text summarization for Kazakh language.

System requirements:

  Software requirements:

  • Operating system   :   Windows.
  • Coding Language  :   Python.

Hardware components:

System                   :   Pentium IV 2.4 GHz or intel

Hard Disk              :   40 GB.

Floppy Drive         :   1.44 Mb.

Mouse                    :   Optical Mouse.

Ram                       :   512 Mb.

Conclusion:

The primary goal of this work was to a construct efficient method of automatic text summarization by using Natural Language Processing (NLP) and Machine Learning (ML) techniques specifically for Kazakh language texts. Results have shown state-of-the-art performance of proposed algorithm on this task. Further research can include implementation of abstractive text summarization for Kazakh texts as a new method. In addition to that, text summarization for news can be done per weeks and months, where the main news will be identified based not only on keywords, but also on their view number.

March 14, 2022

0 responses on "Automatic Summarizing the News from Inform.kz by Using Natural Language Processing Tools"

Leave a Message

Template Design © VibeThemes. All rights reserved.