15. FEATURES OF THE CONTENT-ANALYSIS METHOD FOR TEXT CATEGORIZATION OF COMMERCIAL CONTENT IN PROCESSING ONLINE NEWSWORK WORKS
15.1. INFORMATION
The methods and tools development for automatic processing of text of commercial content in modern information technology are important and topical [1-5] (for example, systems of information retrieval, machine translation, semantic, statistical, optical and acoustic analysis and synthesis of speech, automated editing, knowledge extracting from the text content, text content abstracting and annotation, textual content indexing, training and didactic, linguistic buildings management, instrumental means of dictionaries conclusion of various types, etc.) [6-15]. Specialists actively seeking new models of description and methods for automatic processing of text content [2-4]. One of these methods is the development of general principles of lexicographic systems of syntactic type. It is important by these principles these systems construction of text content processing for specific languages [1, 5]. In the last ten years humanity has implemented a significant step in developing and implementing new technologies. Development of technologies has given the opportunity to solve a lot of complex tasks, which touch humanity, but also generate new tasks, solution of which is difficult. One of these tasks is a task of content analysis. Methods and systems of content analysis are used in various areas of human activity (politics, sociology, history, philology, computer science, journalism, medicine, etc.) [1-5]. These systems are quite successful and do not require large funds and time to get the desired result. At the same time using this type product allows you to increase the level of success at 60 %. Basic system of content analysis includes the following features: quick information updates, searching for information on this resourse, collect data about the customers and potential customers, creating and editing surveys, analysis of resource visitations. If to automate system for the using information system of content analysis, the workload can be reduced, the time for processing and obtaining the necessary information can be also reduced, productivity of work system increases which leads to a decrease in expenses of money and time to get the desired result. Issue of the theme has been caused by increasing demands of the users of these systems and by the following factors: rapid growth in demand for reliable information, the necessity of forming plurals operational information as well as use for automatic filtering unwanted information [1-5].
15.2. RECENT RESEARCH AND PUBLICATIONS ANALYSIS
Any tools of syntactic analysis consists of two parts: a knowledge base about a particular natural language and algorithm of syntactic analysis (a set of standard operators of text content processing on this knowledge) [1-5]. The source of grammatical knowledge is data from morphological analysis and various filled tables of concepts and linguistic units [2]. They are the result of the empirical processing of textual content in natural language of experts in order to highlight the basic laws for syntactic analysis. Table-based of linguistic units constitute configurations or valences sets (syntactic and semantic-syntactic dependencies) [2]. This is a lexical units list/dictionaries as instructions for every of them all possible links with other units of expression in natural language [2, 5]. In implementing of the syntactic analysis should be achieved full independence of rules of tables data transform from their contents. This change of this content does not require algorithm restructuring.
The vocabulary V consists of finite not empty set of lexical units [2]. The expression on V is a finite-length string of lexical units with V. An empty string does not contain lexical items and is denoted by . The set of all lexical units over V is denoted as . The language over V is a subset . The language displayed through the set of all lexical units of language or through definition criteria, which should satisfy lexical items that belong to the language [2]. Another is one important method to set the language through the use of generative grammar. The grammar consists of a lexical units set of various types and the rules or productions set of expression constructing. Grammar has a vocabulary V, which is the set of lexical units for language expressions building. Some of lexical units of vocabulary (terminal) can not be replaced by other lexical units.
15.3. RESEARCH RESULTS ANALYSIS
Development of Internet technologies and its services gave the humanity access to virtually unlimited quantity of information but as often happens in these cases - there is a problem in reliability and efficiency. It is for that, because the information was efficient and trustworthy, technology of content analysis are implemented. The use of these technologies allows you to receive the information as a result of her functioning, provides an opportunity to interference in the system operation to increase the level of that system, the activity of the information resource and for popularity increase among the users. World's leading producers of processing information resources work actively in this direction such as Google, AІІM, CM Professionals organization, EMC, IBM, Microsoft alfresco, Open Text, Oracle, SAP. Content analysis is a high-quality and quantitative method information studies, which is characterized by objectivity of conclusions and austerity of procedure and is in the quantitive treatment of results further interpretation [1]. Content Management System, CMS is a software for web-sites organization or other information resources in the Internet or computer networks [1]. Today there are hundreds of available CMS and due to the functionality they can be used in different areas. Despite the wide range of tool and technical facilities available at CMS properties for all content management systems are similar. The Web content management system (WCMS) is a software complex which provides functions of creating, editing, control and organization Web pages. WCMS is often used for blogs creation, personal web pages and online- shops and are intended for users, who are not familiar with programming [1]. The following analysis stages are identified [5]:
(Для ознайомлення з повним текстом статті необхідно залогінитись)