Неактивна зіркаНеактивна зіркаНеактивна зіркаНеактивна зіркаНеактивна зірка
 

14. FEATURES OF TEXT CATEGORIZATION OF COMMERCIAL CONTENT

14.1. INTRODUCTION

The Internet active development promotes the needs growth in production/strategic data and new forms of information services implementation [1]. Documented information is an informational product or commercial content, if it is prepared in accordance with user needs and intended to meet them. Electronic content commerce systems development and implementation is one of the e-business development strategic directions. A characteristic feature of such systems is the automatic information resources processing to increase content sales of permanent user, for potential users active involvement and expanding the target audience boundaries [1]. The actual problem in the electronic content commerce systems design, development, implementation and maintenance is to the research active development in the e-business. An important problem is the lack of theoretical justification, standardized methods and software for information resources processing in such systems. There are new approaches and solutions to this problem. But the important issue is the discrepancy between the known methods and software of information resources processing and the electronic content commerce systems construction principles. There is no common approach of electronic content commerce systems creation and standardized methods of information resources processing in these systems. The methods and tools development for automatic processing of text of commercial content in modern information technology are important and topical [1-5] (for example, systems of information retrieval, machine translation, semantic, statistical, optical and acoustic analysis and synthesis of speech, automated editing, knowledge extracting from the text content, text content abstracting and annotation, textual content indexing, training and didactic, linguistic buildings management, instrumental means of dictionaries conclusion of various types, etc.). Specialists actively seeking new models of description and methods for automatic processing of text content [2-4]. One of these methods is the development of general principles of lexicographic systems of syntactic type. It is important by these principles these systems construction of text content processing for specific languages [1, 5].

14.2. RECENT RESEARCH AND PUBLICATIONS ANALYSIS

Any tools of syntactic analysis consists of two parts: a knowledge base about a particular natural language and algorithm of syntactic analysis (a set of standard operators of text content processing on this knowledge) [1-5]. The source of grammatical knowledge is data from morphological analysis and various filled tables of concepts and linguistic units [2]. They are the result of the empirical processing of textual content in natural language of experts in order to highlight the basic laws for syntactic analysis. Table-based of linguistic units constitute configurations or valences sets (syntactic and semantic-syntactic dependencies) [2]. This is a lexical units list/dictionaries as instructions for every of them all possible links with other units of expression in natural language [2, 5]. In implementing of the syntactic analysis should be achieved full independence of rules of tables data transform from their contents. This change of this content does not require algorithm restructuring.

The vocabulary V consists of finite not empty set of lexical units [2]. The expression on V is a finite-length string of lexical units with V. An empty string does not contain lexical items and is denoted by . The set of all lexical units over V is denoted as . The language over V is a subset . The language displayed through the set of all lexical units of language or through definition criteria, which should satisfy lexical items that belong to the language [2]. Another is one important method to set the language through the use of generative grammar. The grammar consists of a lexical units set of various types and the rules or productions set of expression constructing. Grammar has a vocabulary V, which is the set of lexical units for language expressions building. Some of lexical units of vocabulary (terminal) can not be replaced by other lexical units.

14.3. RESEARCH RESULTS ANALYSIS

The commercial content formation for information resource provides a link between the input data from different sources set and the commercial content set into the appropriate database in electronic content commerce systems that can be presented as  ® ® ® ® ® ® , where  – content source,  – matched content from the source,  – the relevant sources data set,  – content formation operator in a fixed time  under  appropriate conditions,  –formed content under  conditions,  – generated content set,  – commercial content prevailing database. Content formation model in electronic content commerce systems can be showed as

,

where  – input data set  from different information resources or the moderators at ;  – content collecting/creating operator from various sources;  – content formatting operator;  – the content key words and concepts identify operator;  – content categorization operator;  – the content duplicate detect operator;  – content digest formation operator;  – content selective distribution operator;  – the content forming transaction time  while ;  – a commercial content set  with . The content formation is described by the form  operator, where  – the content formation conditions set, i.e. .

(Для ознайомлення з повним текстом статті необхідно залогінитись)