My Math Forum  

Go Back   My Math Forum > College Math Forum > Applied Math

Applied Math Applied Math Forum


Reply
 
LinkBack Thread Tools Display Modes
February 26th, 2009, 10:32 PM   #1
Newbie
 
Joined: Feb 2009

Posts: 6
Thanks: 0

keyword relativity datatype weights and algorithm

Hello,

I have an issue with a project we're working on. I'm building a search engine and need help with the algorithm.

We have multiple datatypes e.g. "meta title", "meta description", "meta keywords", "page copy", "alternative text", "tags", and several others that I would like to assign weights to as to what I think is important. Also, incorporate relativity to the words and phrases that are being searched which are within these datatypes.

I'm really good at mysql and php, but haven't applied math like this since college which was about 17 years ago.

Completely clueless on where to start, or even knowing if this is the right area to post.

Any direction at all would be greatly appreciated!
totus is offline  
 
February 27th, 2009, 07:13 PM   #2
Global Moderator
 
CRGreathouse's Avatar
 
Joined: Nov 2006
From: UTC -5

Posts: 16,046
Thanks: 932

Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms
Re: keyword relativity datatype weights and algorithm

Moved to Applied Math.

A simple algorithm would be to add up the weights of each item that applies. If a keyword is in the meta title, add META_TITLE_WEIGHT to a running total; if a keyword is in page copy add in PAGE_COPY_WEIGHT; and so on.

You could modify it later to favor meta titles and such that are short (etc). A 1000-word meta title that has your keyword means a lot less than a 6-word meta title.
CRGreathouse is offline  
February 28th, 2009, 02:16 PM   #3
Newbie
 
Joined: Feb 2009

Posts: 6
Thanks: 0

Re: keyword relativity datatype weights and algorithm

Thanks so much CRGreathouse. How would this look from a formula perspective? Adding up the weights of each item that applies makes sense, I just don't know how this formula would look and be applied to a pre-defined index generation process.

For example if I've got 50k sites to be placed into the index, I need a running total formula to determine each datatype for each site to determine how each site should be ranked when words / phrases are searched. Would a running total formula be enough for this?

Please excuse my ignorance.
totus is offline  
February 28th, 2009, 02:50 PM   #4
Global Moderator
 
CRGreathouse's Avatar
 
Joined: Nov 2006
From: UTC -5

Posts: 16,046
Thanks: 932

Math Focus: Number theory, computational mathematics, combinatorics, FOM, symbolic logic, TCS, algorithms
Re: keyword relativity datatype weights and algorithm

I don't know what you mean by running total formula. Why don't you just write what you're intending to use (or a simplified version of it, or pseudocode, or whatever) and I'll tell you what I think.
CRGreathouse is offline  
February 28th, 2009, 10:08 PM   #5
Newbie
 
Joined: Feb 2009

Posts: 6
Thanks: 0

Re: keyword relativity datatype weights and algorithm

Okay, first let me show you how we're storing data:

After crawling page much like Google bots do, we insert the example data into the db as datatypes.

SiteMetaTitle: Sample Math Site
SiteMetaDesc: The description of the math site
SiteMetaKeys: math, algebra, geometry, physics
SiteCopy: all the text within the main page.

I'm using the Zend Search Lucene libraries to build the index: http://devzone.zend.com/node/view/id/91

As part of the indexing building process a score for each site is calculated but its very rudimentary and easily abused by repeating keywords in pages. Here is the current default algorithm:

http://163.23.89.100/tea_doc/Zend/zend. ... ng.scoring

I hope this somewhat clears up what I'm asking. Essentially I'm wanting to make the algorithm a little more intelligent by assigning weights to each datatype, and what are termed as long tail phrases (3 or more words) within any datatype. If I knew the math behind it, couldn't I expand upon whats provided? In fact, I found a lite write-up on how to assign weights to the code here: http://spindrop.us/2007/05/29/boosting- ... ch-lucene/ however, I'd rather come up with a more efficient automated way of doing it. I'm lost on the math.

Thanks again for your valuable time, its greatly appreciated.
totus is offline  
Reply

  My Math Forum > College Math Forum > Applied Math

Tags
algorithm, datatype, keyword, relativity, weights



Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Calculating keyword permutations jonbarrett Elementary Math 7 September 8th, 2013 07:59 PM
V=Relativity=2 Algebra 28 April 3rd, 2011 05:53 AM
Find previous weights from current weights & percent change billiboy Applied Math 4 June 15th, 2009 11:26 PM
Relativity Infinity Physics 3 January 19th, 2007 05:53 AM
keyword relativity datatype weights and algorithm totus Real Analysis 0 January 1st, 1970 12:00 AM





Copyright © 2017 My Math Forum. All rights reserved.