Characterizing Web Spam Using Content and HTTP Session Analysis

Characterizing Web Spam Using Content and HTTP Session AnalysisShort Description
Another challenge posed by the JavaScript techniques is. the nondeterministic behavior of … JavaScript location object. This technique accounts for 7% …

Website: faculty.cs.tamu.edu | Filesize: 541kb

Content
Characterizing Web Spam Using Content and HTTP
Session Analysis
Steve Webb
College of Computing
Georgia Institute of
Technology
Atlanta, GA 30332
webb@cc.gatech.edu
James Caverlee
College of Computing
Georgia Institute of
Technology
Atlanta, GA 30332
caverlee@cc.gatech.edu
Calton Pu
College of Computing
Georgia Institute of
Technology
Atlanta, GA 30332
calton@cc.gatech.edu
ABSTRACT
Web spam research has been hampered by a lack of statistically
significant collections. In this paper, we perform the
first large-scale characterization of web spam using content
and HTTP session analysis techniques on the Webb Spam
Corpus - a collection of about 350,000 web spam pages. Our
content analysis results are consistent with the hypothesis
that web spam pages are different from normal web pages,
showing far more duplication of physical content and URL
redirections. An analysis of session information collected
during the crawling of the Webb Spam Corpus shows significant
concentration of hosting IP addresses in two narrow
ranges as well as significant overlaps among session header
values. These findings suggest that content and HTTP session
analysis may contribute a great deal towards future
efforts to automatically distinguish web spam pages from
normal web pages.
1. INTRODUCTION
Web spam has…

Get the file Download here

AddThis Social Bookmark Button
Related Books:
  • Top Ten Things About Spam Firewalls
  • A Taxonomy of JavaScript Redirection Spam
  • Using MX Records and Spam Firewalls
  • Fang A Firewall Analysis Engine
  • Front-line Spam Defense for Mail Services in Mac OS X 10.4.x / Mac …
  • Internet Security Intelligence Briefing
  • Session II Healthy Sex/Safe Sex
  • Smackdown for AJAX Programming Models and Frameworks TS-2991

  • Related Searches: , , , ,



    Comments

    Leave a Reply