::cubework

current release branch 0.9

What is it?


cubework is a home-made, object oriented PHP framework using MVC architecture, it is a one man project. Development started in August 2009 or so.

Main features:
* lightweight (only 140KB)
* object-oriented
* Model-View-Controller architecture
* database: everything that PDO supports
* cache: generic, memcached
* search support
* multi-laguage support (gettext)

0.9 released

posted by gman on Dec, 30th 2010 @ 20:24
The major thing in this new release branch is support for searching. The class making all that search magic happen is called Lib_Search.

=== Architecture ===
It uses Inverted Index[1][2]. It indexes all words longer than user defined value (which is by default 2).
index => (COO)
COO (Coordinate List) is an array which consists of 4 entries: table name, primary key, primary key column name and hit score. With this made, it will save this into cache. Since memcached's maximum size of one entry is 2MB and there is a chance that all indices will eat more space, we need to save it to the cubework's generic cache. For that matter, I created setCacheType and resetCacheType methods. The first one takes 1 parameter - the cache name (memcached or generic) and the second one resets the cache type, so you can change cache type on the fly.

=== Implementation ===
It's very easy to implement cubework's search engine into your application. The first thing to do is we create a Lib_Search object. The constructor takes an array of documents we want to be indexed:
$searchObj = Lib_Search(array('table1' => array('primary_key_col', 'col2', 'col3'...), 'table2' => array('primary_key_col'...));

The key is name of the table. It's value is again an array. In the first place there is name of the column where your primary keys live in, then your other columns.

Next, you may want to set TTL of the cache so it can update all indices. You can do that like this:
$searchObj->setTTL(0);

It takes 1 parameter which is an unix timestamp. It defaults to 0 (never update) so if you don't want to update it, you don't have to call setTTL at all. Just a small warning: if you set it to something other than 0, bear in mind that as your database goes larger and larger and larger it might run too long and throw time limit error (default php's execution time limit is 30 seconds) or memory exhaustion due to lot of text.

After this, you can set the minimum lenght of a word so it doesn't index stuff like "I" or "a" or "or" and so on, hence eat up a lot of memory because php's serialize/unserialize takes frankly quite a lot of space, but on the other hand json_undecode is slower than unserialize and that's the point of a search engine. To find things fast.
$searchObj->setMinimalStrLenght(2);

Defaults to 2.

Also, you can set the maximum words count - how much words can user ask to be searched
$searchObj->setMaxWordCount(15);

Defaults to 15.

You can (and your users would be happy) allow to search similar words. Because now, it's just key => value sort of thing. With this allowed, if your user will search let's say for "hell", word "hello" will be also a relevant result. They just have to enter the right base word. I'm not sure whether I used the right term but what I mean is if they'd entered "hello" they can't expect to find "hell".
$searchObj->enableSimWordSearch(false);

It's disabled by default.

Also, you want your users to search for something, right? You can do that by typing:
$searchObj->setSearchRequest($string);

Where $string is a raw (unfiltered) string. Don't worry, it will automatically filter it.

And finally, to actually search for our string:
$searchObj->search();


So yeah, I guess it's really easy. And one thing which you might like: boolean search. If your user searches for "hello -hell", it will search for everything where is word hello AND where is not word hell. Also by typing "killing +squirrel", your user will get some info about a really nasty, brutal, killing squirrel.
* + means this word has to be contained in every result (it will search only those indices where all COOs are the same as the COO of the +word)
* - means this word won't be contained in any result ((it will not search only those indices where all COOs are the same as the COO of the -word but, if you searched for "-something" you won't get anything. This thing is forbidden)


=== Lib_Db ===
Added support for ODBC, PostgreSQL, SQLite2, SQLite3, or you can specify your own PDO configuration in your config file:
resources.database.dsn = ...

Of course it has to be supported by PDO (look it up at php.net[3])


And some other bugfixes, minor feature updates. Sorry guys I don't remember all of them.

References:
[1]: <a href=http://en.wikipedia.org/wiki/Inverted_index target=_blank>http://en.wikipedia.org/wiki/Inverted_index</a>
[2]: <a href=http://en.wikipedia.org/wiki/Index_ target=_blank>http://en.wikipedia.org/wiki/Index_</a>%28search_engine%29#Inverted_indices
[3]: http://sk2.php.net/manual/en/pdo.drivers.php

A little update

posted by gman on Oct, 23rd 2010 @ 11:42
I've maden a little update to Lib_Session. I forget to implement the GC for file session handler because I was always using the cache one and since this website is hosted the administrator won't allow us to use memcached so that forces me to care about file sess handler too. Now in your configuration file you need to set the session handler like this:

session.handler = ; you can set it to default or file or cache

Release

posted by gman on Oct, 16th 2010 @ 20:16
Today we've released cubework v0.8. There are still things to do but yeah. At least something.