
Applied Math Applied Math Forum 
 LinkBack  Thread Tools  Display Modes 
February 26th, 2009, 10:32 PM  #1 
Newbie Joined: Feb 2009 Posts: 6 Thanks: 0  keyword relativity datatype weights and algorithm
Hello, I have an issue with a project we're working on. I'm building a search engine and need help with the algorithm. We have multiple datatypes e.g. "meta title", "meta description", "meta keywords", "page copy", "alternative text", "tags", and several others that I would like to assign weights to as to what I think is important. Also, incorporate relativity to the words and phrases that are being searched which are within these datatypes. I'm really good at mysql and php, but haven't applied math like this since college which was about 17 years ago. Completely clueless on where to start, or even knowing if this is the right area to post. Any direction at all would be greatly appreciated! 
February 27th, 2009, 07:13 PM  #2 
Global Moderator Joined: Nov 2006 From: UTC 5 Posts: 16,046 Thanks: 932 Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms  Re: keyword relativity datatype weights and algorithm Moved to Applied Math. A simple algorithm would be to add up the weights of each item that applies. If a keyword is in the meta title, add META_TITLE_WEIGHT to a running total; if a keyword is in page copy add in PAGE_COPY_WEIGHT; and so on. You could modify it later to favor meta titles and such that are short (etc). A 1000word meta title that has your keyword means a lot less than a 6word meta title. 
February 28th, 2009, 02:16 PM  #3 
Newbie Joined: Feb 2009 Posts: 6 Thanks: 0  Re: keyword relativity datatype weights and algorithm
Thanks so much CRGreathouse. How would this look from a formula perspective? Adding up the weights of each item that applies makes sense, I just don't know how this formula would look and be applied to a predefined index generation process. For example if I've got 50k sites to be placed into the index, I need a running total formula to determine each datatype for each site to determine how each site should be ranked when words / phrases are searched. Would a running total formula be enough for this? Please excuse my ignorance. 
February 28th, 2009, 02:50 PM  #4 
Global Moderator Joined: Nov 2006 From: UTC 5 Posts: 16,046 Thanks: 932 Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms  Re: keyword relativity datatype weights and algorithm
I don't know what you mean by running total formula. Why don't you just write what you're intending to use (or a simplified version of it, or pseudocode, or whatever) and I'll tell you what I think.

February 28th, 2009, 10:08 PM  #5 
Newbie Joined: Feb 2009 Posts: 6 Thanks: 0  Re: keyword relativity datatype weights and algorithm
Okay, first let me show you how we're storing data: After crawling page much like Google bots do, we insert the example data into the db as datatypes. SiteMetaTitle: Sample Math Site SiteMetaDesc: The description of the math site SiteMetaKeys: math, algebra, geometry, physics SiteCopy: all the text within the main page. I'm using the Zend Search Lucene libraries to build the index: http://devzone.zend.com/node/view/id/91 As part of the indexing building process a score for each site is calculated but its very rudimentary and easily abused by repeating keywords in pages. Here is the current default algorithm: http://163.23.89.100/tea_doc/Zend/zend. ... ng.scoring I hope this somewhat clears up what I'm asking. Essentially I'm wanting to make the algorithm a little more intelligent by assigning weights to each datatype, and what are termed as long tail phrases (3 or more words) within any datatype. If I knew the math behind it, couldn't I expand upon whats provided? In fact, I found a lite writeup on how to assign weights to the code here: http://spindrop.us/2007/05/29/boosting ... chlucene/ however, I'd rather come up with a more efficient automated way of doing it. I'm lost on the math. Thanks again for your valuable time, its greatly appreciated. 

Tags 
algorithm, datatype, keyword, relativity, weights 
Thread Tools  
Display Modes  

Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Calculating keyword permutations  jonbarrett  Elementary Math  7  September 8th, 2013 07:59 PM 
V=Relativity=2  Algebra  28  April 3rd, 2011 05:53 AM  
Find previous weights from current weights & percent change  billiboy  Applied Math  4  June 15th, 2009 11:26 PM 
Relativity  Infinity  Physics  3  January 19th, 2007 05:53 AM 
keyword relativity datatype weights and algorithm  totus  Real Analysis  0  January 1st, 1970 12:00 AM 