Posted on

Issues downloader for Bitbucket

This is an example of how to create a simple downloader of websites, which are behind login mechanism and further download them – for example bitbucket issues. While the target of your crawing could differ, and could go to xxx, my case was finite, which makes it slighlty easier, since i knew the base of URIs before i even started. Long story short – here is a code which could backup your pages while handling post data & cookies and is purely in Node.js.

What i needed for issues downloader:

  • nodejs runtime(of course)
  • some way to make html requests easy
  • entry point & ending point – basically where to login and when to end
  • https cerificates(explained later)
  • something for parsing html(jquery like cheerio)
  • some output channel for backup – preferably filesystem module(fs)

 

From my point of view, Node.js is for this job suitable as much as python or perl or whatever iterpreted language. The downside of node as “callback-providing-engine” is nested calls when you are lazy. Fortunately, this code won’t have more than 200 lines at most. First thing i needed was to create an authorization request, so let’s check the login page, since i don’t know if bitbucket has an API for that.

 

bitbucket-login

So according to developer tools, i see only next field, csrf field, email and password. The first one is probably not important, but just in case, let’s include it. The second one is token holder, which should be as important as email or password. As it is common, the csrf value is normally doubled in cookies.

 

cookies

 

That means that for authorizing, first i need to scrap the login page for required datas and then forge an HTTP POST request which should be the same as when using the login form on the login page. Lets write some code.

First – include required modules(of course – npm install xxx) – all of them right now, since i know i will need them. Everything should be self explanatory, maybe only rimraf. That is a module for recursive directory removal. Since i will be storing the html pages, i will probably need some directory and with directory i mean CLEAN directory.

 

Second step – retrieve the login page:

I started writing in functions, so it will be easier later when joining the code. As simply as it seems, the first function downloads the login page. Basically everything i need is the csrf token, which is duplicated – one instance is in the form, while the second is in cookies. Since i don’t want to parse the html only for one token, i grab it from the second occurence.

 

Third step – attempt to login. Since i have already checked the login page, i know that the url which handles the POST request from the form has the same url as login page requested through classic HTTP GET request(ie. in browser), so no need to change the url from previous step. Just create the same request, but change the method, add some data.

If you run this code, you will see, that something is not right. Even though the HTTP response status code is 200(ok), the body doesn’t contain the page, which you would normally get after login. When you debug the body variable, you will see the page and thankfully, the site shows you where exactly was the mistake. Well more like mistakes.

That probably means, that the request above didn’t send csrf in cookies, so the server couldn’t do csrf checkup – from request’s perspective. Request module don’t sent cookies between requests automatically, it has flag for it – named ‘jar’. It could be enabled globally – like this:

The second problem require to provide additional header. The idea behind this is generally to stop CSRF attacks. No problem, just change the request to this:

Note: when you run this code and print the output, the html would still contain the CSRF verification problem text. That is because the request would normally redirect the visitor. For the complete response, you can add  followAllRedirects : true(it will redirect the POST HTTP request to dashboard – parameter next – ‘/’)

 

Fourth step – download one issue page and save it. For this we have to store the project url and issue number – url of the issue is working even without the name in it(ie. https://bitbucket.org/<user>/<repo>/issues/<issueNumber>).

And since i am going to save the actual downloaded page into directory, let’s delete it at first and then create it anew.

OK, this is the basics. When you run the script, it will download the page into defined directory like /issues/1/1.html. This downloaded page is fortunately(thank you bitbucket) able to show everything what you normally see when you go through the login process in your browser and navigate to the url of the issue(css + js working). But you may see a little problem – if you aren’t currently logged in while checking the downloaded page, you won’t see the uploaded images for the issue – they are behind another redirect and physically uploaded somewhere in Amazon cloud.

 

So without auth session active, you won’t see uploaded images – and that will be in the part two of this article.

Note: bitbucket.org offers export for your issues, but that works only if you have admin privileges. Otherwise there is not an API for you to casually download those issued like JSON or even raw besides doing it manually. Chrome store has some plugins, but seriously – you have to provide credentials, which i simply don’t want to.

Posted on

ES6 Proxy and catch-all method

As i talked about ecma2015/es6 with a few people, i’ve realized that many features are still a mystery in the community. One of them is Proxy object. Besides classic language changes there is a handful of goodies, which i particularly really like.

 

General Proxy

What is the proxy object? If you haven’t heard of it yet, let’s look at this example schema of classical Proxy server:

Network Proxy in real life

Uses of proxy servers are many, just picture this one: proxy lies between target and source of the request and is masking the source identity, so the target does not know about the real computer beyond the proxy. This concept is similar to NAT(although it is working diferently). As you can see from the schema above – there is a way for the proxy server to see and optionally alter the response or request. And for the goal of this article it’s enough to know, that javascript Proxy object is masking the call to some method/atribute of another object – basically is wrapping the target object. Since it is a wrapper and is “trapping” all calls, it can easily alter them.

In the contrast to some other languages, the javascript lacked this functionality a few years ago. It is said, that the proxy is similar to python’s get-attribute-access methods  or php’s catch-all method, or that it implements meta-programming or is basis for defensive programming. You can definitely use it for many benefits, that’s for sure.
(Note – this article is considering support in Node.js rather than browsers) The existence of Proxy objects is around for several years now(as a planned feature for ecma2016/es6) but wasn’t fully implemented even in Node.js until recently(6.0.0 it looks like). Even so, many enthusiasts has been using shims/transpilers/workarounds to be able to use Proxy objects before. Now they don’t have to complicate things. Back in the good ol’ days of Node.js version 0.12.x developers had another choice – to use old Proxies API, which was somehow not as flexible as new API. Even so, to get this done they had to enable harmony feature(s) to be able to run their code.

 

You can find an archive of this old API on mozilla docs.

Let’s talk about small difference between the old API and new API – the basic usage:

 

The new one is unifying the previous two methods. The target argument in new API accepts array, generic object, even proxy or function. The old API has been more prone to mistakes as could be seen below and this was probably the reason why they updated it. The handler argument alone is basically the same and could be used without changes in both APIs, while it is basically an object which holds functions – or better traps(full list ie. here). I will skip the basics and follow slightly more advanced usage, since i don’t believe that anybody would be using the proxies with only the handler(ie. without meaningful target / proto arguments).

 

This will be basically about implementing the __noSuchMethod__ in objects, which could be aliased as catch-all method. First – the old API:

 

However this code is working, it’s not exactly what i want – i don’t want to call the bundle function and create the instance of the Dummy inside it…let’s put it somewhere else.

 

Ok, that’s better. But now i am contradicting my initial statement, that i want to use both parameters. Moreover, this code is not the best, since the function in get trap would be created everytime when new Dummy would be created. One method of fixing this is to move this function out of constructor.

 

And there is another problem – it uses global variable, which would be overwritten everytime when new Dummy would be created in this scope. Somehow, the reference to created object has to be passes into the get function. But that won’t be the issue with the new API as this below would simply work, since we are not passing non-instantiated Dummy, but complete object.

 

Conclusion

The last code example is crude, yet effective way to implementing catch-all method in your objects which in turn could help you tremendously when creating large project with many models / controllers while fully using inheritance. Personally, i see a big difference while creating code purely in es6(August 2016 – however still using babel with es6). Then again – i would prefer even better usage – if not built-in method for each object, then at least directly extending Proxy prototype(or “class” if we are talking about es6) which is an idea for part 2 of this article.

 

Posted on

Content searching on paginated websites

Even if it is not so common, there are websites, which won’t allow you to enter text and search by it. But you can utilize console in your browser.

Moreover, what if they are so called one-page websites. If you still want to search for something and you don’t have time to expose the api, then you have to write some small functions.

 

Here is an example(using vanilla JS – if the web isn’t using mootools or jQuery or anything else):

If you are using some older browser, it doesn’t have to has document.getElementByClassName, so you have to define it by yourself. This function is little different, it returns single element or false, not an array(as default function in newer browsers).

 

Let’s assume you have to check loaded content for any text you are looking for. Then you would need function which governs this:

 

Next you would need some sort of evaluating the return value from last function. I named it testPage:

 

And since you want to run this checking indefinitely, you have to apply some cycling:

I added a little lame stop condition, which stops the next repeat call. I haven’t created any asynchronous support in form of promise. Instead i used only plain timeout. In this time the whole content of the next page should be reloaded. If not – create some promise support* or change the timeout value to bigger number.

By running ‘go(‘i am looking for this’)’ in console, you would see the progress and hopefully the requested item after x pages.

 

What if my page hasn’t unique identification by class / id?

Then you have to use other methods. You could just search by href in ‘a’ element.

 

Now you have to keep the current page in variable, since you will be searching by the next page url. Everything put in go function looks like this:

 

Happy hunting.

 

(*If you know precisely which pages you can just download(like ‘/search?page=xxx’), you can use the XMLHttpRequest object with promise support(or callback) – more here).

Posted on

Neural net-like canvas animation(2)

How to use html5 canvas to create animation that looks like neural net(part 2)

I took a little harder way to start, since I first introduced a node model, which is more complicated than simple line model. Here it is:

 

Line model is quite easy to grasp. It is a basic line between two nodes. Because all the movements are stored in Node model, it’s not required to do any other actions besides stoic animation, which only follows those two nodes. As a plus point – i don’t need to call init method.

But with the line accessing node’s pos attribute it is obvious we can’t move, because the attribute is updated only after the animation is complete. We have a few options, with 2 listed here:

  • use reference, so the pos attribute would be always the current one(or rather the next one) – preferred
  • update the values every time after iteration

 

Our first app.js script only works with one node. Now it should utilize two nodes and one line:

After you run it, you should see two points moving around while constantly being connected by line. Easy as that.

 

Let’s update our basic recommendations for models and add this:

  • Each animation should be stored as object. This object should have its own context(in terms of attributes).
  • Animation object should therefore consist of blueprint action(basically unique for each model so it could be stored inside of current model instead of another script) and function which prepares this action

 

Render method now utilizes disposableRenderActions. Since there could be time, when we need to queue more animations, but the old splice-to-dispose could broke the for renderActions cycle. With this, all of those finished animations should be removed at the end of render call. For simplicity, it is placed in Node model instead of Base model. If you are wondering about the ‘this.renderActions[i].iteration();’ part, check the code below.

 

Following my notes about models, each animation object should have its render part stored in actions attribute. With this, the model won’t be the one, that repaints the context. In the core, it should be the action of that model. According to new moveTo method, the rendering would be called through bind function. That gives more freedom while rendering, especially with attributes which belongs to animation object.

Animation object has two basic callable methods – iteration and finished. Accordig to previous code, call to finished method would dispose of associated animation right after current render method. As before, this occurs when animation reaches the iteration max counter.

 

With this, the Node model is prepared for another type of method – behavioral.

The point of this method is repeatedly calling some animation. It could be done like this, or it could be made into bigger moveTo method, which resets iteration count and sets a new positions. I wanted it to be separated so i picked the first one. The code above demonstrates how to do it BUT also demonstrates necessity to implement pseudo-asynchronous control flow, which i pointed out in the previous article in Base model(on and event methods).

The thing is, the repeated moveTo call would be disposed even before the increment call. It’s because the callback would be called before the render cycle ends, therefore the disposableRenderAction part would always remove the new animation instead of the old one.

 

On & Event mechanism is example of preventing such thing to happen, but also provides more flexibility in other areas. Please note, that for parallel animations it is still lacking something.

 

And finally app.js

This creates 40 nodes and for each 2 neighboring nodes additional one line. That means even first and last node.
Second example.

Posted on

Neural net-like canvas animation

How to use html5 canvas to create animation that looks like neural net

A few months ago i saw some animation, which you could see even in some movies – looks like neural network with all of these nodes and lines. I thought it would be nice to do it in html canvas. The whole net should be able to generate random movements and as a bonus, it should be able to react to mouse events(demo at the bottom)

For the gist of this idea, you can look at the following image, which serve as the basis http://cdn.phys.org/newman/gfx/news/hires/neuralnetwor.jpg

(*side note – i believe there is some lib on github which is doing the same and even better as a lib)

 

This example doesn’t use any third party lib, just vanilla javascript, so it could be added everywhere without dependencies.

As i planned the structure, i prepared one lib script, and three models for starters:

  1. Base model
  2. Node model
  3. Line model

Base model should contain basic functionality, which other models should inherit.
Node model stands for those dots which are points between lines – like joints.
And finally lines – between each two nodes, there is always line which connects them.

 

Let’s create html template:

There is nothing much to comment – just plain html template with style element and virtual paths to scripts.
I intend to use canvas element by getElementById and pass this element into my small lib. The lib itself should handle all around it. Manipulating and basic interaction with lib while initializing should be placed in app.js, which i cover later.

 

For now let’s create basic class for the lib.

I created basic class with attributes which are essetial(besides animationQueue – let’s ignore it for now).
This lib should accept any given object, which follows some interface(later covered in base model) through addObject method. I don’t want to link everything everywhere, so i am pushing only canvas context and basic color setting into every single new object.

 

Constructor doesn’t have any specialities, just accepts canvas element and initializes some arrays. Color would be later part of more robust default options, which could be altered by some other method.

 

Canvas as html element represents real painting canvas which could be seen everywhere – you pick some brush and paint some lines. When finished or not satisfied, just get the layer and throw it away. So you can’t just pick some object you painted and just alter it or remove it. Simply as that, the lib should provide other basic function renderCycle and clean for painting the whole layer and cleaning respectively. With models which gains reference to context, they could easily manage the rendering on their own. Lib just picks objects in cycle that should be rendered right away.

 

renderCycle should be called and then it should work on its own. That is guaranteed by the requestAnimationFrame part, which basically registers event, which should be fired before the whole browser repaints the scene(on browser level). This guarantees smoother framerate. Also the event is not eternal, that’s why the renderCycle is called again in the callback.

 

For better portability between browsers, you should consider using something like this:

var requestAnimationFrame = window.requestAnimationFrame || window.mozRequestAnimationFrame || window.webkitRequestAnimationFrame || window.msRequestAnimationFrame;

 

Now lets create base model:

Because i want everything to be inherited, i don’t need to initialize anything in constructor. That means for attributes, that should be present in extended object, I had to create special method in prototype which only initializes those attributes. Next there is empty render method. Each and every different model should have it’s own render method, because logic ought to be different. Here I don’t need to specify anything, it works just as an interface in other languages that supports interfaces.

Also I prepared methods on & event which stands for on & emit for example in node.js events module. on registers callback that should be called when specified event is triggered. Additional arguments are just for convenience.

 

So here is the basic plan for models:

  • Each model should follow base model’s render function, which repeatedly, hopefully at 60fps(pc master race!) paints on canvas its part.
  • Model could have animations queues
  • When there is no more animations, the render function is still called. And because the canvas is always cleaned before each render cycle, the object should repaint its current position(let’s call this stoic repaint)
  • Animation should have its starting point and could have ending point. After animation, the current position has to be updated so next animation(even if stoic) uses this new position
  • Each animation could be just called from outside just once. Everything should be covered inside the model

 

Node model:

This is first version of node model. As a single model, it defines own attributes pos and nextPos(since this node should be moving in something like animations). Also it has defaults object which now consists only of value for maximum interations count – more iterations means longer animation. This node model has one animation method – moveTo. It creates anonymous function, which would be called in every iteration. After it ends, it should be disposed of by splicing the animation queue array.
This node model has already its render method in usable form with stoid rendering in case of empty queue.

 

So let’s create the app.js script which finally utilizes the lib:

After running the index.html you should be able to see the first animation which would differ each time you run it. Here is the first example.

I will cover more in the next article.

 

(images: http://rajasegar.deviantart.com/, http://phys.org/ )

Posted on

Angular.js in WordPress?

As web designers / programmers want to create sites loaded with user friendly access, the (still) favorized way is to utilize ajax functionality to some degree. But let’s say I want to use Ajax with Php,WordPress. This way is rather straightforward in WP – just copy code from Codex, that means create callback function, and ajax call using jQuery. It’ easy, we love it….but…

 

What if I have large / more complicated application to code? Perhaps I don’t like jQuery for this usage and I want to use something more convenient for current situation – for example Angular.js. I don’t want to create outstanding tutorial for creating Angular.js applications, here are just few ideas and options.

 

And one thing – many people could tell you, that using angular in WordPress is overkill. I agree.

Second thing – many people could tell you, that mixing angular and jQuery is bad idea(which you can’t avoid in WordPress completely). I agree.

The thing is – if you use it correctly and won’t mix it without thinking, then it is ok.

 

 

First of all, let’s assume that we have controllers, directives… set. Now we should create some Http provider service. Nothing much, just so it can be of use.

I have inline-injected core http service, which is now wrapped in my service named httpService. Http service is based on – or rather is dependent – $q service, which in return causes, that I can expect promise object returned. I utilize this in controller( .then(…)). Call from controller or something:

Okay, run it. You should get the page you have requested. This was easy…nothing too much hard. Now what about post method? Again call from controller or something:

Well, if i were to evaluate data received, i would receive ‘0’, or false.  The data which were sent could be in JSON format, that means in my case {action:get_group_product, product_id:xxx}. Why is that?

 

Answer lies in method, how wordpress accepts ajax calls and how jQuery packs data for transfer. In this case, angular isn’t doing anything bad, it just envelopes data in JSON container. So we need to do two things:

  • tell WordPress, that these datas are really in its favorized form, that means as form(html element) data
  • give angular, or better – http service, data encoded exactly how jQuery does

In practical example, it means changing and adding lines of code like this:

If you wonder why I used jQuery like this and tells you not to mix it without thinking – it’s just an example for you to see, that this line could be replaced by angular thing – you could use also

instead of that line.

With this, your sending request will be in correct form for WordPress ajax functionality.

 

Next we need backend ajax callback, that is if I use post method. In my case I want to call get_group_product action with one parameter and I want to get data in JSON format – why not. This step is covered on WordPress codex, but for completion:

With this, our Ajax call will be send right into get_group_product() function. You can see that I am using json_encode, which will cause, that http service will return data as Object. This object contains config data, and also my awaited data, so for access be prepared to use data.data.some_var

That’s it, you now have the basics, but there is always place for experimenting with angular.js in wordpress .