Wednesday, October 13, 2010

11:07 PM
This post shows how to use PHP to extract the keywords searched on by a user when they found your website using a seach engine. Bing, Google and Yahoo are covered here and you can easily add your own to the PHP code supplied.


PHP functions used

The code example here uses the parse_url function to extract the parts from the referer URL and then the parse_str function to extract the parts of the query string into array variables. 
The referer URL is stored in the $_SERVER PHP superglobal as $_SERVER['HTTP_REFERER'], but only if it was set by the web browser

Referer URL examples

Here's some example referer URLs from Bing, Google and Yahoo from people reaching this blog.
http://www.bing.com/search?q=javascript+date+to+timestamp&src=IE-SearchBox&FORM=IE8SRC

http://www.google.de/search?q=apache+restart&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:de:official&client=firefox-a

http://us.yhs.search.yahoo.com/avg/search?fr=yhs-avg-chrome&type=yahoo_avg_hs2-tb-web_chrome_us&p=concatenation+in+mysql

You can see from looking at the URLs that Bing and Google store the keyword word as the "q" variable and Yahoo does it with "p".

The code

Here's the PHP code to extract the keywords entered from the above examples:
function search_engine_query_string($url = false) { if(!$url) { $url = isset($_SERVER['HTTP_REFERER']) ? $_SERVER['HTTP_REFERER'] : false; } if($url == false) { return ''; } $parts = parse_url($url); parse_str($parts['query'], $query); $search_engines = array( 'bing' => 'q', 'google' => 'q', 'yahoo' => 'p' ); preg_match('/(' . implode('|', array_keys($search_engines)) . ')\./', $parts['host'], $matches); return isset($matches[1]) && isset($query[$search_engines[$matches[1]]]) ? $query[$search_engines[$matches[1]]] : ''; }

The way that it works is to either use a URL passed in or $_SERVER['HTTP_REFERER'] if one is not passed. It then extracts the parts from the URL (line 10) and then the breaks the pieces of the query string into values in an associative array (line 11).
A list of search engines is defined from lines 13 to 17 as an associative array containing the main part of the domain (i.e. in www.google.com the 'google' bit) mapped to the variable name in the query string. You can add additional search engines to this array.
Note that the array index (i.e. the 'google' bit) is used to match against the search engine's domain using this index value plus a period/dot. Therefore 'google' would match www.google.com, www.google.co.nz and even notgoogle.com.
The regular expression could be modified to ensure there's a period/dot at the start of the host OR the host starts with the domain, but I'm personally happy to leave it as-is for the moment; you are free of course to modify the code if you prefer to ensure a more exact match.
The regular expression on line 19 matches the search engine name into the $matches array, and line 21 returns the keywords if the search engine domain matched and a keyword variable was found.
Note that parse_str will remove any URL encoding so e.g. "javascript+date+to+timestamp" will be returned as "javascript date to timestamp".

Examples

So here's some examples running the above function using the referer URLs from the beginning of the post:

echo search_engine_query_string('http://www.bing.com/search?q=javascript+date+to+timestamp&src=IE-SearchBox&FORM=IE8SRC');

// echoes "javascript date to timestamp"

echo search_engine_query_string('http://www.google.de/search?q=apache+restart&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:de:official&client=firefox-a');

// echoes "apache restart"

echo search_engine_query_string('http://us.yhs.search.yahoo.com/avg/search?fr=yhs-avg-chrome&type=yahoo_avg_hs2-tb-web_chrome_us&p=concatenation+in+mysql');

// echoes "concatenation in mysql"

0 comments: