Friday, December 4, 2015

Web Mining, Some Definitions

According to  (Mobasher, Jain, Han, & Srivastava, 1997) Web mining is “the application of data mining and knowledge discovery techniques to data collected in the World Wide Web transactions.”

(Cooley, Mobasher, & Srivastava, 1997) defines to web mining as the, “the discovery and analysis of useful information from the World Wide Web” and the “application of data mining techniques to the World Wide Web.”

After a couple months of reading papers on this subject, my take is that web mining can be defined as the many different ways to gain insight, most often focusing on the different ways that users use a given web “site,” by applying data mining techniques to all the data that a web server accumulates.  As nearly every web presence in 2015 is some form of a web application, mining the data produced on such web servers go way beyond who requested which “page” and when, but the papers from many years ago focused on those three attributes as the first papers focused on gleaning data from server web logs almost exclusively.  

In 2015, we have the ability to mine new dimensions to a user's experience to form a more elaborate view of the user’s context as we can gather much more data on the specifics of the usage within a given “page.”  We have also come to rank importance of different tasks.  For instance: viewing an item vs purchasing an item, in the case of eCommerce.  

So we now have greater sources of data originating from the web server, but we still tend to focus on how the users are using these web applications so that we can improve the experience and create a more valuable application.  This might be why the term “web usage mining” is more common in more recent papers rather than simple "web mining."

(Mobasher, Cooley, & Srivastava, 2000) explain further that “web usage mining systems run any number of data mining algorithms on usage or clickstream data gathered from one or more Web sites in order to discover user profiles.“  In (Yang, Kou, Chen, & Li, 2007) they explain that Web usage mining, “is the application for data mining techniques to analyze and discover interesting patterns of user’s usage data on the web.”

A complete discussion of the processes and methods of web mining is beyond the scope of this post and probably are best covered in a future text book, but I would like to quote (Arbelaitz et al., 2013) as they summarize this area of data mining research:  

“Web mining can be defined as the application of machine learning techniques to data from the Internet. This process requires a data acquisition and pre-processing stage. The machine learning techniques are mainly applied in the pattern discovery and analysis phase to find groups of web users with common characteristics related to the Internet and the corresponding patterns or user profiles. Finally, the patterns detected in the previous steps are used in the operational phase to adapt the system and make navigation more efficient for new users or to extract important information for the service providers.”


Arbelaitz, O., Gurrutxaga, I., Lojo, A., Muguerza, J., PĂ©rez, J. M., & Perona, I. (2013). Web usage and content mining to extractknowledge for modelling the users of the Bidasoa Turismo website and to adaptit. Expert Systems with Applications, 40(18), 7478–7491. doi:10.1016/j.eswa.2013.07.040
Cooley, R., Mobasher, B., & Srivastava, J. (1997). Web mining: information and patterndiscovery on the World Wide Web. IEEE International Conference on Tools with Artificial Intelligence, 558–567. doi:10.1109/TAI.1997.632303
Mobasher, B., Cooley, R., & Srivastava, J. (2000). Web usage mining can help improve the scalability, accuracy, and flexibility of recommender systems. Communications of the ACM, 43(8), 142 – 151. doi:10.1145/345124.345169
Mobasher, B., Jain, N., Han, E. S., & Srivastava, J. (1997). Web Mining : PatternDiscovery from World Wide Web Transactions. Technical Report, 1–25. Retrieved from http://eolo.cps.unizar.es/docencia/doctorado/Articulos/DataWebMining/webminer-tr96.pdf

Yang, Q. Y. Q., Kou, J. K. J., Chen, F. C. F., & Li, M. L. M. (2007). A New Similarity Measure for Generalized Web Session Clustering. Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), 3(Fskd).

No comments:

Post a Comment