Tuesday, January 10, 2012 

Removing unnecessary .svn folders from

Found this cool script here to clear .svn folders on windows and a blog here  to delete the same on linux. Have fun.

Wednesday, January 04, 2012 

Java collections and generics.

Collections Framework.

General DS:
Arrays
Linked list
Doubly linked list.
- Array / linked list can be used as stack, queue, heap.

Interface hierarchy.
- Lists
-- Lists
-- ArrayLists
-- Vectors
-- LinkedList
- Set
- Map
-- Hashmap
-- Sorting
- Sorting Collections
-- Comprable
-- Comparator

Joshua Bloch - created the collections framework.

The Collections class is implemented by set,list and queue. and holds a list of static methods to work on the collection of objects.

General methods in collection.
Add object
Remove object
Search / Find object.
Retrieve object
Iterate over objects.

Collections framework hierarchy.









































List interface provides things like:
allows duplicates.
addObject
removeObject
iterate over them.

Set interface is about:
- Does not allow duplicates.










Tuesday, December 27, 2011 

Mysql cluster & replication.

A typical MySQL cluster setup involves 3 components in at least this configuration:
  • 1 management (ndb_mgmd) node.
    • Management nodes contain the cluster configuration.
    • A management node is only needed to connect new storage and query nodes to the cluster and do some arbitration.
    • Existing storage and query nodes continue to operate normally if the management node goes down.
    • Therefore, it's relatively safe to have only 1 management node running on a very low spec machine (configuring 2 management nodes is possible but is slightly more complex and less dynamic).
    • Interfacing with a management node is done via an ndb_mgm utility.
    • Management nodes are configured using config.ini.
    • My setup here involves 1 management node.
  • 2 storage (ndbd) nodes.
    • You do not interface directly with those nodes, instead you go through SQL nodes, described next.
    • It is possible to have more storage nodes than SQL nodes.
    • It is possible to host storage nodes on the same machines as SQL nodes.
    • It is possible, although not recommended, to host storage nodes on the same machines as management nodes.
    • Storage nodes will split up the data between themselves automatically. For example, if you want to store each row on 2 machines for redundancy (NoOfReplicas=2) and you have 6 storage nodes, your data is going to be split up into 3 distinct non-intersecting chunks, called node groups.
    • Given a correctly formulated query, it is possible to make MySQL scan all 3 chunks in parallel, thus returning the result set quicker.
    • Node groups are formed implicitly, meaning you cannot assign a storage node to a specific node group. What you can do, however, is manipulate the IDs of the nodes in such a way that the servers you want will get assigned to the node groups you want. The nodes having consecutive IDs get assigned to the same node group until there are NoOfReplicas nodes in a node group, at which point a node group starts.
    • Storage nodes are configured using /etc/my.cnf. They are also affected by settings in config.ini on the management node.
    • My setup here involves 4 storage nodes.
  • 2 query (SQL) nodes.
    • SQL nodes are regular mysqld processes that access data in the cluster. You guessed it right – the data sits in storage nodes, and SQL nodes just serve as gateways to them.
    • Your application will connect to these SQL node IPs and will have no knowledge of storage nodes.
    • It is possible to have more SQL nodes than storage nodes.
    • It is possible to host SQL nodes on the same machines as storage nodes.
    • It is possible, although not recommended, to host SQL nodes on the same machines as management nodes.
    • SQL nodes are configured using /etc/my.cnf. They are also affected by settings in config.ini on the management node.
    • My setup here involves 4 SQL nodes.

Friday, December 16, 2011 

Finding prime numbers with in a given number n.

I stumbled upon this question in Interviewstreet.com - looked very basic question, however I tried attempting it to check their editor. Later found that this was good question than I thought. The problem arises when N is = 1000000. We have to manage both memory and execution time. 

For execution time to be optimized, used this algo for finding the prime numbers with in a given number. For memory modified it further to work in steps of 100000.   
-------------------------------------
Write a function getNumberOfPrimes which takes in an integer as its parameter, 
to return the number of prime numbers that are less than N

Sample Testcases:
Input #00:
100
Output #00:
25

Input #01:
1000000
Output #01:
78498
----------------------------------------
Below is the solution. I have used php. I would add comments and optimize those loops further when I get some free time later. 

function  getNumberOfPrimes($n) {
    $count = 0;
    $prime = array();
    //optimization for memory split the number in to sub groups of 100000
    $step = 100000;
    $maxitr =  floor($n/$step);
    $itr = 0;
    $prime = array();
    $lastprime = 2;
    
    do{ 
    $numbers = array();
    
    if($itr){
     $start = (($itr)*$step)+1; 
    }else{
        $start =2;
    }
    if($itr == $maxitr){
     $end = $n;     
    }
    else{
     $end = ($itr+1)*$step;
    }
    
    $len = sqrt($end);
    $numbers = array_fill($start,($end-$start),true);  
    
    if(count($prime)){
    foreach($prime as $value){
     $multiplier =  floor($start/$value);     
      for($j=$value,$remove = $j*$multiplier;$remove<$end;$multiplier++){
       $remove = $j*$multiplier;
       $numbers[$remove]=false;
      }
    }
    $lastprime = end($prime);
    }
    
    for($i=$start;$i<=$len;$i++)
    {
     if($numbers[$i]){
      for($j=$lastprime,$remove = $j*$i;$remove<$end;$j++){
       $remove = $j*$i;
       $numbers[$remove]=false;
      }
     }
    }
    for($i=$start;$i<$end;$i++){
     if($numbers[$i]){
      $count++;
      $prime[] = $i;
     }
    }
    $itr++;
    }while($itr < $maxitr);
 
    return $count;
}
-----------------------------------

Saturday, December 10, 2011 

Javascript performance.



Scope maangement:
- Try to bring global variables that you use frequently to the local scope of the function.
- Avoid with statements (as they augment the scope chain further) / even catch statements if there is a better way to handle it.
- Use closure statements sparingly. As they would create min 3 scope chains
Data access
- Literals and variables are faster than objects and arrays. (firefox arrays are faster than objects)
Store these in a local variable:
– Any object property accessed more than once
– Any array item accessed more than once
=============================
function processData(data){
if(data.count > 0 ){
for(var i = 0; i<data.count;i++){
processData(data.item[i]);
}
}
}
============================
the below code is 33% faster in IE compared to the previous code.
=============================
function processData(data){
var count  = data.count, item = data.item;
if(count > 0 ){
for(var i = 0; i<count;i++){
processData(item[i]);
}
}
}
============================

Loops: Avoid unnecessary calculations in the loops. Make them faster.
DOM:
HTMLCollections object is slow - getElementsByTagName(), getElementsByClassName(), document.forms etc

Minimize the reflow:
It happens on - page load, browser resize, dom nodes addition removal, style property changes on elements, some times even access of style properties.

- So instead of doing append child inside a for loop, create a documentFragment inside for loop and then append it to the required element at one go.
- Updating style object one after the other is slow. Instead group them under a single class in a css file and change the css class at one go using javascript.
- accessing offset widths, might trigger reflow, dont do them in loops.

Saturday, November 26, 2011 

Sorting algorithms.

Sorting Algorithms:
Lets say we have pack of cards that are to be sorted.

Bubble sort:
Compare first with the second, choose the smaller and swap them. Then compare the second one with the third and so on. Once a iteration is done, repeat the steps again. Repeat the iterations till you have not done even a single swap in the last iteration.

Time complexity:
Worst case: O(n2)
Base case: O(n)
Average: O(n2)
Space complexity: Inplace O(1)
IsStable: Yes
Java implementation:




Selection sort: 
Find out the smallest number in the given set of cards, you do this by taking the first card, compare it with the next set of the cards one by one, if you find a smaller card than the current card, you swap the smaller one on to your hand and continue with the comparison, once the list is exhausted. The card in your hand is the smallest,  place it at the start of the list. Then continue to find the smallest in the next set. Once you do this for the entire list, the list would be in sorted order.

Worst case: O(n2)
Base case: O(n2)
Average: O(n2)
Space complexity: Inplace O(1)
IsStable: No (however with a with increased space complexity can be done)
Java implementation:

Insertion sort:

Insertion sort is like adding new card to an already ordered set of cards at hand.
You pick a key element which is the new number that you have picked up, then you go through the already arranged elements and compare them with key, if any number is greater than key, you move it to one place up, and once you dont find any number greater than key with in the current list, you insert the key at that place.

Worst case: O(n2)
Best case: O(n)
Average: O(n2)
Space complexity: Inplace O(1)
IsStable: yes
Javascript implementation:
function insertionsort() {
    var a = [].slice.call(arguments);
    var l = a.length;
    for (i = 1; i < l; i++) {
        k = i - 1;
        key = a[i];  // hold the current value of key in key.
        // check if the numbers which are sorted till now are greater than key, if you find any, move them one level up and insert the key there.
        while ((k > = 0) &&(a[k] > key)) {
            a[k + 1] = a[k];
            k--;
        }
        a[k + 1] = key;
    }
    console.log(a);
}

Shell sort:
Its hard to explain shell sort. Basically you choose some gaps and then pick numbers with in that gap and sort/swap them. And repeat this process till the start end. You can find the video explaining this


Another better one.


The time complexity of the shell sort depends on the gap you have choosen. For


sequence (2 pow k − 1) complexity is (N pow 3/2)
successive numbers of the form (2 pow p) or (3 pow q)) complexity is (N((logN)pow 2)) - ex: 1, 2, 3, 4, 6, 8, 9, 12
Best case: O(n)
Space complexity: Inplace O(1)
IsStable: no
Java implementation:


Quick sort:
Pick a pivot card, and partition the numbers into to numbers less than pivot card, numbers greater than pivot card. This can be done by comparing the

You then repeat this process on the left and right hand side partitions by choosing another pivot with in them. The process continues till you cant select a pivot.

Time complexity:
Worst case: O(n2) (happens when the elements are already sorted in the opposite order, to avoid this we generally choose a randomized pivot)
Best case: nlogn
Average: nlogn
Space complexity: log n
IsStable: depends ??
Java implementation:
public static class QuickSort{
  
  
  public void sort(int[] a, int left, int right){
   int p;
   if(left < right){
   p = partition(a, left,right);
   sort(a,left,p);
   sort(a,p+1,right);
   
   
   }
  }
  
  public int partition(int[] a, int left, int right){
   
   int pivot = a[right];
   int i=left;
   int j= right;
   int temp;
   while(true){
    while(a[i]<pivot){
     i++;
    }
    while(a[j]>pivot){
     j--;
    }
    
    if(i<=j){
    temp = a[i];
    a[i] = a[j];
    a[j] = temp;
    i++;
    j--;
    }
    else{
     return j;
    }
    
   }
   
  }
}
Merge sort:
Divide cards into two sublists of about half the size. Sort each sublist recursively. Till the list is of length 0 or 1, then it is already sorted. Merge the sublists back into one sorted list.
Time complexity:
 Worst case: O(n log n )
Best case: O(n log n )
Average: O(n log n )
Space complexity: n
IsStable: yes
Java implementation:
Heap sort: Create a heap structure and then keep reading the min of the heap. Once the min is picked, keep deleting the min element. That would return the sorted list. This cannot be demonstrated using the cards.
 Time complexity:
 Worst case: O(n log n )
Best case: O(n log n )
Average: O(n log n )
Space complexity: 1
IsStable: No
Java implementation:
Binary sort: Construct a binary tree and do a inorder traversal to print the sorted sequence.

Non Comparison sorts:

These sorts are done where in we are dealing only with number esp of some range. Hence better performance then nlogn is possible.
Radix sort:
Lsd radix sort, sort the numbers based on their least significat bits and repeat this process with each more significant bit.
 Example:
Original, unsorted list:
170, 45, 75, 90, 802, 24, 2, 66
Sorting by least significant digit (1st place) gives:
 170, 90, 802, 2, 24, 45, 75, 66
Notice that we keep 802 before 2, because 802 occurred before 2 in the original list, and similarly for pairs 170 & 90 and 45 & 75.
Sorting by next digit (10s place) gives:
802, 2, 24, 45, 66, 170, 75, 90
Sorting by most significant digit (100s place) gives:
2, 24, 45, 66, 75, 90, 170, 802
Note the sorting the last digits etc takes O(n) time only as we can use bucket sort.
 Time complexity:
 Worst case: O(n log n )
Best case: O(n log n )
Average: O(n log n )
Space complexity: 1
IsStable: No
Java implementation:
Bucket sort: Place each card in a buckets which are alread in sorted order, and then retrieve them as per the sort order of the buckets.
 Time complexity:
 Worst case: O(n+r) (r is the range of numbers to be sorted - basically number of buckets)
Best case: O(n+r)
Average:O(n+r)
Space complexity: O(n+r)
IsStable: Yes
Java implementation:
Count sort:
Asymptotic analysis: logn < n < n2 (n - square) <n3... < 2 to the power of n. big O notation. n2+2n+3n+logn = O(n2) ADT - Stack (with arrays) Stack implementation in Javascript. - Queue (with arrays / linked lists ) - Dqueue (with linked lists) - Dqueue can be used to implement stacks and queues - Circular lists with a sentinel is a efficient way to implement stacks / queues.?(this is interesting to implement) - Vector (ranks) - Position - List (based on positions) - Sequence (rank + positions) - Dictionary - Hash tables - Binary trees - Redblack trees - Avl trees - B-trees Binary Search: First sort and then divide and conquer the same to search O(log n base 2) Hash function: Hash codemap - key -> integer Hash compression map - integer -> [0, N-1] stack - creation tight strategy - (add constant) - f(N) = N+c  - O(n2) growth strategy - (double up) - f(N) = 2N  - O(n)

Thursday, November 24, 2011 

Multi tenant challenges.

The main reason for using multitenancy is to do remove maintenance night mare.


Views:
- Customizable with css and html (to an extent) - choosing a different css file for each tenant.
- In MVC drive the views based on tenant with each view for tenant in a different folder.
- All static files are stored under tenant specific folders - which is driven via a config file.

Business logic:
- Modularize the code and enable only few modules for each tenant.
- Inversion of control - using hooks.
- Dependency injection - driving the class names from config files based on the tenant.
- Generative programming. (while deploying the tenant, use admin tool to drive some of the code).

Data:
http://msdn.microsoft.com/en-us/library/aa479086.aspx

A database for each tenant
security- database conn is different so no problem
extending - create separate columns for the tenant if required. or have prefixed extendable columns.
scaling - scale the tenant database as required.

Single database with different table prefix for each tenant

security- database table prefix is different so not much problem if there is a single db class where we append prefix.
extending - create seperate columns for the tenant if required. or have prefixed extendable columns.
scaling - move the tables as required to a different database and scale it.


Single database with same tables with tenant id key.

security- Care should be taken to ensure the security of the queries.
extending - Have extrafields up front / have a separate table where you store extended columns. as explained here.
scaling - Use horizontal partitioning for tables using partition ids.


 

Web application security.

SQL injection:

The preferred option is to use a safe API which avoids the use of the interpreter entirely or provides a parameterized interface. Beware of APIs, such as stored procedures, that appear parameterized, but may still allow injection under the hood.
If a parameterized API is not available, you should carefully escape special characters using the specific escape syntax for that interpreter.
Positive or "whitelist" input validation with appropriate canonicalization also helps protect against injection, but is not a complete defense as many applications require special characters in their input.



Cross site scripting xss:

The preferred option is to properly escape all untrusted data based on the HTML context (body, attribute, JavaScript, CSS, or URL) that the data will be placed into. Developers need to include this escaping in their applications unless their UI framework does this for them. See the OWASP XSS Prevention Cheat Sheet for more information about data escaping techniques.
Positive or “whitelist” input validation is also recommended as it helps protect against XSS, but is not a complete defense as many applications must accept special characters. Such validation should decode any encoded input, and then validate the length, characters, and format on that data before accepting the input.

Using xss people can redirect users from your site to another, or get the session ids of your users and do session fixation etc.



Cross site forgery:

The easiest way to check whether an application is vulnerable is to see if each link and form contains an unpredictable token for each user. Without such an unpredictable token, attackers can forge malicious requests. Focus on the links and forms that invoke state-changing functions, since those are the most important CSRF targets.
- using posts and referrer header checks helps to an extent.



Failure to restrict private url access Insecure direct object references:
Just by chaning the parameters passed in the forms / urls users should not get access to what they are not supposed to access.
or just by pasting the internal urls users should not be able to view the internal pages with out login.
Broken session management: 
- You dont logout the user after some time of inactivity: If user forgets to logout, some one else can auto login.
- If you use session ids in the url, if another user gets access to that url in the meantime, he can use the session.
- Regenerate the session id on every request to avoid session fixation.
http://shiflett.org/articles/session-fixation


instead of



session_start();
if (!isset($_SESSION['count']))
{
   $_SESSION['count'] = 0;
}
else
{
   $_SESSION['count']++;
}
echo $_SESSION['count'];
?>

use



session_start();
if (!isset($_SESSION['initiated']))
{
   session_regenerate_id();
   $_SESSION['initiated'] = true;
}
?>



Password should be encrypted and stored:

Security misconfiguration:
directory listing.
older version of frameworks
exposing server version numbers.