rejetto forum

Software => HFS ~ HTTP File Server => Topic started by: rejetto on August 21, 2007, 09:31:05 AM

Title: about a search function workload
Post by: rejetto on August 21, 2007, 09:31:05 AM
hi, when a search function will be implemented, i will have to face the problem of it possibly being heavy for the computer.
it should be not heavy for those who have 1000 searchable files, but it may be for who has 20,000 files.
consider that someone may try overloading your HFS with useless searches, slowing down your computer.

so, we should think how to avoid this problem.
since this problem affects people with few files, the solution may even be not appliable to them.
or, a possible limit in the "limits" menu (like an option to inhibit searching for X seconds after the last) may be the way, but this would require the HFS admin to decide... an automatic solution would be better.

we will now eventually propose several options and solutions, but the smallest essential set should then be created. simpler is the best.
Title: Re: about a search function workload
Post by: Foggy on August 21, 2007, 09:38:22 AM
i will have to face the problem of it possibly being heavy for the computer.
it should be not heavy for those who have 1000 searchable files, but it may be for who has 20,000 files.
consider that someone may try overloading your HFS with useless searches, slowing down your computer.

Agreed but unfortunatly I have no ideas for possible solutions

Edit: Is hfs multi threaded? multiple threads could help the workload on some computers but not all.
Title: Re: about a search function workload
Post by: TSG on August 21, 2007, 10:02:06 AM
I agree HFS needs a search function, i ran a search for someone and it dates back to 2004 when people were requesting the ability to search their files, :D. I understand that it will put stress on the host system, but that is the case with any search engine...

But like foggy says, i don't know how the internals of HFS work entirely, so i cant make any detailed suggestions.

My idea, would be for HFS to cache up a list of everything in its VFS to a list, when a person makes a search, the host computer will look through this cached list and then send them back the results, in some way. This list will rescan the VFS for new items every 2 hours or so *maybe modifiable*. To keep the data recent. Unless it is something actually placed into the VFS where that will be added to the list on entry. The only limit then would be... how fast the computer can search through a list of files...

Also, make it so a single IP Address can only search through the list one search at a time, so they cannot have multiple pages open and searching for many things? This is possible?
Title: Re: about a search function workload
Post by: Foggy on August 21, 2007, 10:10:27 AM
TSG's Idea sounds good and to have the limit of only one ip searching at a time is also good.
Title: Re: about a search function workload
Post by: rejetto on August 21, 2007, 10:55:18 AM
Edit: Is hfs multi threaded? multiple threads could help the workload on some computers but not all.

it is not.
but it would not help.

This list will rescan the VFS for new items every 2 hours or so *maybe modifiable*.

i don't think everyone would be happy with it being not always updated.

Quote
Also, make it so a single IP Address can only search through the list one at a time, so they cannot have multiple pages open and searching for many things? This is possible?

consider that being HFS not multi-threaded, only one search at time will be, just like the recursive listing.
when another search is issued, the previous one would be paused. (it may sound strange, but that's how also the listing works).

indeed i'm not talking about having 10 searches at a time. that's not the problem i'm addressing. even having only 1 search at time would take place. but it will keep your computer very busy nevertheless.
most people here don't use HFS on a dedicated computer, but on his own workstation, so we should try to not overload it.
try searching all your hard disk, and in the while you will see how your computer is slowed down.
Title: Re: about a search function workload
Post by: rejetto on August 21, 2007, 11:06:19 AM
i guess a very good automatic way would be for HFS to detect hard disk activity, and having an option (enabled by default) that inhibits searches while there is much HD activity.

i don't know how hard is this detecting to be done. :/
and to be very effective, it should be compared to the HD top speed... that should be measured too, oh my god! though we may consider 40MB/s like being an average value.
Title: Re: about a search function workload
Post by: MarkV on August 22, 2007, 01:03:27 AM
When someone (determined by IP) does many searches (possible DoS attack), then after a while, artificially slow him down.
Title: Re: about a search function workload
Post by: rejetto on August 22, 2007, 01:06:45 AM
how many?
consider that searches will be sequentials and not concurrent.

maybe we will have a clearer view when the function will be available.
Title: Re: about a search function workload
Post by: MarkV on August 22, 2007, 01:14:39 AM
how many?

I think you are right, the exact limits should be adjusted later based on experiences in beta.
Title: Re: about a search function workload
Post by: yhm_7 on August 22, 2007, 07:25:17 AM
how many?
consider that searches will be sequentials and not concurrent.

maybe we will have a clearer view when the function will be available.

we can study some bbs's search fountion.
Admin have the right to set the interval of two search from same ip.

e.g. set time for 15 seconds.
search will be refused if interval of two search is less than 15s.
Title: Re: about a search function workload
Post by: rejetto on August 22, 2007, 08:16:05 AM
i may count the time spent searching, and it could be limited by 1 minute every 10 minutes, per address.
Title: Re: about a search function workload
Post by: maverick on August 23, 2007, 12:59:06 PM
I agree a search function is needed for those that currently don't have that feature built into their sites.

However, don't forget that some of us already use a commercial search engine.  For that reason, make sure that HFS's built-in search function can be turned off or off by default.