Неактивна зіркаНеактивна зіркаНеактивна зіркаНеактивна зіркаНеактивна зірка
 

11. ANALYTICAL METHODS FOR WEB CONTENT PROCESSING

11.1. METHOD OF COMMERCIAL WEB CONTENT FORMING

The commercial Web content formation for an information resource provides communication between inputs data set from different sources and generated content set that is stored in the appropriate database of e-business systems, ie  ®  ®  ® ® ®  ® , where  – the  content source,  – the і-th content from data source,  – content set from corresponding sources of data,  – the operator of the commercial content formation at  at fixed time  at the conditions  ,  – formed under conditions of r-th commercial content,  – a set of commercial content,  – database of commercial Web content.

Model of commercial Web content formation is presented as

, (1)

where  – set of input data  from different information resources or moderators,  – operator of content collecting/creating from various sources,  – content formatting operator,  – the operator of content keywords and concepts identification,  – the operator of automatic categorization of content,  – the operator of content duplication identification,  – the operator of the content digest formation,  – operator of selective distribution of Web content,  – the transaction time of  the content formation,  – a Web content set .

The optimal solution can help navigate in the dynamic input data from different sources. This optimal solution provide the process of information gathering from the sources and its fragments distribution according to the users needs , where  – content set from different sources of data,  – a conditions set of data gathering from a different sources,  – the operator of content gathering/creation,  –  the content gathering/creation time.

Content duplication identification is described by the operator

, (2)

where  – content set from different sources of data,  – conditions set content duplication identification,  – the operator of content duplication identification,  – a content set. Content duplication identification in the e-business systems implements linguistic statistical methods of the general terms serching, which form chain of a verbal signatures in content.

Content syndicate is in data gathering programs training of structural features at individual sources (with information resources, by moderators, users, visitors, journalists, editors), direct scanning of commercial content and his adduction to a common XML format

  (3)

where  – the operator of content formatting,  – a conditions set of commercial content formatting.

The content set processing  for meaningful keywords identification is based on the principles of the keywords finding in content (terms). This is based on Zipf's law and reduced to the words choice wich an average frequency of appearance (the most used words are ignored by stop-dictionary using and rare words from messages text not included). The keywords and concepts identification are defined by the operator

  (4)

where  – conditions set of key words and concepts identification. Commercial Web content classification and distribution implement through a information retrieval system of selective content distribution (the containing router). Content is analyzed for categorization by the operator

 

(5)

 

 

(Для ознайомлення з повним текстом статті необхідно залогінитись)