Aug
29
2015
by admin

Content searching on paginated websites

Even if it is not so common, there are websites, which won’t allow you to enter text and search by it. But you can utilize console in your browser.

Moreover, what if they are so called one-page websites. If you still want to search for something and you don’t have time to expose the api, then you have to write some small functions.

 

Here is an example(using vanilla JS – if the web isn’t using mootools or jQuery or anything else):

If you are using some older browser, it doesn’t have to has document.getElementByClassName, so you have to define it by yourself. This function is little different, it returns single element or false, not an array(as default function in newer browsers).

 

Let’s assume you have to check loaded content for any text you are looking for. Then you would need function which governs this:

 

Next you would need some sort of evaluating the return value from last function. I named it testPage:

 

And since you want to run this checking indefinitely, you have to apply some cycling:

I added a little lame stop condition, which stops the next repeat call. I haven’t created any asynchronous support in form of promise. Instead i used only plain timeout. In this time the whole content of the next page should be reloaded. If not – create some promise support* or change the timeout value to bigger number.

By running ‘go(‘i am looking for this’)’ in console, you would see the progress and hopefully the requested item after x pages.

 

What if my page hasn’t unique identification by class / id?

Then you have to use other methods. You could just search by href in ‘a’ element.

 

Now you have to keep the current page in variable, since you will be searching by the next page url. Everything put in go function looks like this:

 

Happy hunting.

 

(*If you know precisely which pages you can just download(like ‘/search?page=xxx’), you can use the XMLHttpRequest object with promise support(or callback) – more here).