(Cooley, Mobasher, & Srivastava, 1997) defines to web mining as the,
“the discovery and analysis of useful information from the World Wide Web” and
the “application of data mining techniques to the World Wide Web.”
After a couple months of reading papers on this subject, my take is that web mining can be defined as the many different ways to gain insight, most often focusing on the different ways that users use a given web “site,” by applying data mining techniques to all the data that a web server accumulates. As nearly every web presence in 2015 is some form of a web application, mining the data produced on such web servers go way beyond who requested which “page” and when, but the papers from many years ago focused on those three attributes as the first papers focused on gleaning data from server web logs almost exclusively.
In 2015, we have the ability to mine new dimensions to a user's experience to form a more elaborate view of the user’s context as we can gather much more data on the specifics of the usage within a given “page.” We have also come to rank importance of different tasks. For instance: viewing an item vs purchasing an item, in the case of eCommerce.
So we now have greater sources of data originating from the web server, but we still tend to focus on how the users are using these web applications so that we can improve the experience and create a more valuable application. This might be why the term “web usage mining” is more common in more recent papers rather than simple "web mining."
(Mobasher, Cooley, & Srivastava, 2000) explain further that “web usage mining systems run any number of data mining algorithms on usage or clickstream data gathered from one or more Web sites in order to discover user profiles.“ In (Yang, Kou, Chen, & Li, 2007) they explain that Web usage mining, “is the application for data mining techniques to analyze and discover interesting patterns of user’s usage data on the web.”
So we now have greater sources of data originating from the web server, but we still tend to focus on how the users are using these web applications so that we can improve the experience and create a more valuable application. This might be why the term “web usage mining” is more common in more recent papers rather than simple "web mining."
(Mobasher, Cooley, & Srivastava, 2000) explain further that “web usage mining systems run any number of data mining algorithms on usage or clickstream data gathered from one or more Web sites in order to discover user profiles.“ In (Yang, Kou, Chen, & Li, 2007) they explain that Web usage mining, “is the application for data mining techniques to analyze and discover interesting patterns of user’s usage data on the web.”
A complete discussion of the processes and methods of web
mining is beyond the scope of this post and probably are best covered in a future
text book, but I would like to quote (Arbelaitz et al., 2013) as they summarize this area
of data mining research:
“Web mining can be defined as the
application of machine learning techniques to data from the Internet. This
process requires a data acquisition and pre-processing stage. The machine
learning techniques are mainly applied in the pattern discovery and analysis
phase to find groups of web users with common characteristics related to the
Internet and the corresponding patterns or user profiles. Finally, the patterns
detected in the previous steps are used in the operational phase to adapt the
system and make navigation more efficient for new users or to extract important
information for the service providers.”
Arbelaitz, O., Gurrutxaga, I., Lojo, A., Muguerza, J., PĂ©rez,
J. M., & Perona, I. (2013). Web usage and content mining to extractknowledge for modelling the users of the Bidasoa Turismo website and to adaptit. Expert Systems with Applications, 40(18), 7478–7491.
doi:10.1016/j.eswa.2013.07.040
Cooley, R.,
Mobasher, B., & Srivastava, J. (1997). Web mining: information and patterndiscovery on the World Wide Web. IEEE International Conference on Tools with
Artificial Intelligence, 558–567. doi:10.1109/TAI.1997.632303
Mobasher, B.,
Cooley, R., & Srivastava, J. (2000). Web usage mining can help improve the
scalability, accuracy, and flexibility of recommender systems. Communications
of the ACM, 43(8), 142 – 151. doi:10.1145/345124.345169
Mobasher, B.,
Jain, N., Han, E. S., & Srivastava, J. (1997). Web Mining : PatternDiscovery from World Wide Web Transactions. Technical Report, 1–25.
Retrieved from
http://eolo.cps.unizar.es/docencia/doctorado/Articulos/DataWebMining/webminer-tr96.pdf
Yang, Q. Y. Q., Kou, J. K. J., Chen, F. C. F., & Li, M. L. M. (2007). A New Similarity Measure for Generalized Web Session Clustering. Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), 3(Fskd).
No comments:
Post a Comment