Tuesday, December 27, 2011 

Mysql cluster & replication.

A typical MySQL cluster setup involves 3 components in at least this configuration:
  • 1 management (ndb_mgmd) node.
    • Management nodes contain the cluster configuration.
    • A management node is only needed to connect new storage and query nodes to the cluster and do some arbitration.
    • Existing storage and query nodes continue to operate normally if the management node goes down.
    • Therefore, it's relatively safe to have only 1 management node running on a very low spec machine (configuring 2 management nodes is possible but is slightly more complex and less dynamic).
    • Interfacing with a management node is done via an ndb_mgm utility.
    • Management nodes are configured using config.ini.
    • My setup here involves 1 management node.
  • 2 storage (ndbd) nodes.
    • You do not interface directly with those nodes, instead you go through SQL nodes, described next.
    • It is possible to have more storage nodes than SQL nodes.
    • It is possible to host storage nodes on the same machines as SQL nodes.
    • It is possible, although not recommended, to host storage nodes on the same machines as management nodes.
    • Storage nodes will split up the data between themselves automatically. For example, if you want to store each row on 2 machines for redundancy (NoOfReplicas=2) and you have 6 storage nodes, your data is going to be split up into 3 distinct non-intersecting chunks, called node groups.
    • Given a correctly formulated query, it is possible to make MySQL scan all 3 chunks in parallel, thus returning the result set quicker.
    • Node groups are formed implicitly, meaning you cannot assign a storage node to a specific node group. What you can do, however, is manipulate the IDs of the nodes in such a way that the servers you want will get assigned to the node groups you want. The nodes having consecutive IDs get assigned to the same node group until there are NoOfReplicas nodes in a node group, at which point a node group starts.
    • Storage nodes are configured using /etc/my.cnf. They are also affected by settings in config.ini on the management node.
    • My setup here involves 4 storage nodes.
  • 2 query (SQL) nodes.
    • SQL nodes are regular mysqld processes that access data in the cluster. You guessed it right – the data sits in storage nodes, and SQL nodes just serve as gateways to them.
    • Your application will connect to these SQL node IPs and will have no knowledge of storage nodes.
    • It is possible to have more SQL nodes than storage nodes.
    • It is possible to host SQL nodes on the same machines as storage nodes.
    • It is possible, although not recommended, to host SQL nodes on the same machines as management nodes.
    • SQL nodes are configured using /etc/my.cnf. They are also affected by settings in config.ini on the management node.
    • My setup here involves 4 SQL nodes.

Friday, December 16, 2011 

Finding prime numbers with in a given number n.

I stumbled upon this question in Interviewstreet.com - looked very basic question, however I tried attempting it to check their editor. Later found that this was good question than I thought. The problem arises when N is = 1000000. We have to manage both memory and execution time. 

For execution time to be optimized, used this algo for finding the prime numbers with in a given number. For memory modified it further to work in steps of 100000.   
-------------------------------------
Write a function getNumberOfPrimes which takes in an integer as its parameter, 
to return the number of prime numbers that are less than N

Sample Testcases:
Input #00:
100
Output #00:
25

Input #01:
1000000
Output #01:
78498
----------------------------------------
Below is the solution. I have used php. I would add comments and optimize those loops further when I get some free time later. 

function  getNumberOfPrimes($n) {
    $count = 0;
    $prime = array();
    //optimization for memory split the number in to sub groups of 100000
    $step = 100000;
    $maxitr =  floor($n/$step);
    $itr = 0;
    $prime = array();
    $lastprime = 2;
    
    do{ 
    $numbers = array();
    
    if($itr){
     $start = (($itr)*$step)+1; 
    }else{
        $start =2;
    }
    if($itr == $maxitr){
     $end = $n;     
    }
    else{
     $end = ($itr+1)*$step;
    }
    
    $len = sqrt($end);
    $numbers = array_fill($start,($end-$start),true);  
    
    if(count($prime)){
    foreach($prime as $value){
     $multiplier =  floor($start/$value);     
      for($j=$value,$remove = $j*$multiplier;$remove<$end;$multiplier++){
       $remove = $j*$multiplier;
       $numbers[$remove]=false;
      }
    }
    $lastprime = end($prime);
    }
    
    for($i=$start;$i<=$len;$i++)
    {
     if($numbers[$i]){
      for($j=$lastprime,$remove = $j*$i;$remove<$end;$j++){
       $remove = $j*$i;
       $numbers[$remove]=false;
      }
     }
    }
    for($i=$start;$i<$end;$i++){
     if($numbers[$i]){
      $count++;
      $prime[] = $i;
     }
    }
    $itr++;
    }while($itr < $maxitr);
 
    return $count;
}
-----------------------------------

Saturday, December 10, 2011 

Javascript performance.



Scope maangement:
- Try to bring global variables that you use frequently to the local scope of the function.
- Avoid with statements (as they augment the scope chain further) / even catch statements if there is a better way to handle it.
- Use closure statements sparingly. As they would create min 3 scope chains
Data access
- Literals and variables are faster than objects and arrays. (firefox arrays are faster than objects)
Store these in a local variable:
– Any object property accessed more than once
– Any array item accessed more than once
=============================
function processData(data){
if(data.count > 0 ){
for(var i = 0; i<data.count;i++){
processData(data.item[i]);
}
}
}
============================
the below code is 33% faster in IE compared to the previous code.
=============================
function processData(data){
var count  = data.count, item = data.item;
if(count > 0 ){
for(var i = 0; i<count;i++){
processData(item[i]);
}
}
}
============================

Loops: Avoid unnecessary calculations in the loops. Make them faster.
DOM:
HTMLCollections object is slow - getElementsByTagName(), getElementsByClassName(), document.forms etc

Minimize the reflow:
It happens on - page load, browser resize, dom nodes addition removal, style property changes on elements, some times even access of style properties.

- So instead of doing append child inside a for loop, create a documentFragment inside for loop and then append it to the required element at one go.
- Updating style object one after the other is slow. Instead group them under a single class in a css file and change the css class at one go using javascript.
- accessing offset widths, might trigger reflow, dont do them in loops.