Counting number of inversion in an array (python)

def count_inversion(array):
    (count,sorted_array) = count_inversion_helper(array)
    return count

def count_inversion_helper(array1):
    """"
    counts the number for inversion in one array we assume the inversions
    are related to sorted array from
     0 to n (pep8 tested with 10.0 score)
    :author name: Tanmay Datta
    :param array1: array to count (list type)
    :return: number of inversoin
    :type array1: List(Num)
    """"
    length = len(array1)
    if length <= 1:
        #Trivial case and no inversion
        return (0, array1)
    if length == 2:
        if array1[0] &amp< array1[1]:
            return (0, array1)
        else:
            return (1, [array1[1], array1[0]]) # sorted array
    else:
        half = length/2
        right_array = array1[half:]
        (left_inversion, left_array) = count_inversion_helper(array1[:half])
        (right_inversion, right_array) = count_inversion_helper(array1[half:])
        sorted_array = []
        left_idx = 0
        right_idx = 0
        left_size = len(left_array)
        right_size = len(right_array)
        total_inversions = left_inversion + right_inversion
        while True:
            if right_array[right_idx] < left_array[left_idx]:
                sorted_array.append(right_array[right_idx])
                total_inversions += left_size - left_idx
                right_idx += 1

            else:
                sorted_array.append(left_array[left_idx])
                left_idx += 1
            if left_idx >= left_size:
                sorted_array += right_array[right_idx:]
                break
            if right_idx >= right_size:
                sorted_array += left_array[left_idx:]
                break
        return (total_inversions, sorted_array)

Another funky implementation which uses a “bisect” function already given by python libraries

import bisect

def ms_funky(arr):

sorted_array = sorted(arr)

r = 0

for n in arr:

    pos = bisect.bisect_left(sorted_array,n)

    r += pos

    sorted_array.pop(pos)

return r

Advertisements

Merge sort in python (implementation,testing,re implementation, style fix)

I am taking one online course on algorithms and thought would write a programming tutorial (or at least my thought process) along with implementation of merge-sort algorithm.

————————— merge-sort rules——————————-

If array size is 1 it’s always sorted

If array size is 2 then sorting is trivial and could be done as (if a[1] < a[2] then good otherwise reverse it)

In all other case divide array in 2 parts and perform sort individually.

Once both sided are sorted the merge them back iteratively by checking each element one by one in  the half array

—————————————-First attempt ————————————————-

def merge_sort(arr):

l = len(arr)

if l <= 1:

    return arr

elif l ==2:

if arr[0] < arr[1]:

    return arr

else:

    return [arr[1], arr[0]]

else:

    half = l // 2

    left = merge_sort(arr[:half])

    right = merge_sort(arr[half:])

    sorted_arr = []

    for idx in xrange(0,half):

    if left[0] <= right[0]:

        v = left.pop(0)

    else:

        v = right.pop(0)

    sorted_arr.append(v)

    sorted_arr = sorted_arr + left + right

    return sorted_arr

Looks ok (hmmm!!!!)

As a programmer the most important thing that you need to check a code for is ____ –>

“Correctness”

let’s write some tests

What are good tests ?

1. Checks for “common” values correctness

2. Checks for “extreme” cases correctness

3. Checks for exceptions and correctness

from back of my mind I can think of following cases

test1 = [1,2,3,4,5,6] (Checks what happens when array is already sorted)

test2 = [4,3,1,7,6,5] (checks for the real sorting problem under normal kind of case (even numbers)

test3 = [4,3,1,7,6,5,2]  (checks similar to test 2 along just odd numbers)

test4 = [] (Extreme case when array is empty)

test5 = [4,4,4,4,4] (Extreme case when all elements are same)

test6 =[2,2,1,1,5,3,6,100] (Not extreme but good case having some duplicates)

———————————————Input————————————————-


test1 = [1,2,3,4,5,6]

test2 = [4,3,1,7,6,5]

test3 = [4,3,1,7,6,5,2]

test4 = []

test5 = [4,4,4,4,4]

test6 =[2,2,1,1,5,3,6,100]

print merge_sort(test1)

print merge_sort(test2)

print merge_sort(test3)

print merge_sort(test4)

print merge_sort(test5)

print merge_sort(test6)

——————————————output at first attempt——————————————


[1, 2, 3, 4, 5, 6]
[1, 4, 3, 5, 7, 6]
[1, 2, 4, 3, 5, 6, 7]
[]
[4, 4, 4, 4, 4]
[1, 1, 2, 2, 3, 5, 6, 100]

Hmm so the code has some bug for test2 and test3 😦

what happend?

The code has a logic flaw (instead of a for there should be a while) posting the code without explaining the logical flaw as I think that is quite trivial

def merge_sort(arr):

l = len(arr)

if l <= 1:

return arr

elif l ==2:

if arr[0] < arr[1]:

return arr

else:

return [arr[1], arr[0]]

else:

half = l // 2

left = merge_sort(arr[:half])

right = merge_sort(arr[half:])

sorted_arr = []

while len(left) != 0 and len(right) != 0:

if left[0] <= right[0]:

v = left.pop(0)

else:

v = right.pop(0)

sorted_arr.append(v)

sorted_arr = sorted_arr + left + right

return sorted_arr

——— So all the test passed after this ——

Next step –>

Fix style

Style depends a lot on programmers taste but there are certain things that everyone (at least every python programmer) would appreciate. So I am going to just touch those points

def merge_sort_withstyle(arr):

"""

The program is a style corrected version written by Tanmay for

mergesort

"""

length = len(arr)

sorted_arr = arr if length

return arr if length <= 1

if length <= 1:

return arr

elif length == 2:

return arr if arr[0] < arr[1] else [arr[1], arr[0]]

else:

half = length//2

left = merge_sort_withstyle(arr[:half])

right = merge_sort_withstyle(arr[half:])

while len(left) != 0 and len(right) != 0:

value = left.pop(0) if left[0] < right[0] else right.pop(0)

sorted_arr.append(value)

sorted_arr += left + right

return sorted_arr

Next step (as a mathematician and a really good programmer)

Optimize 

(have fun !! )

c++ 11 features with practical examples (for each or range based loop)

I am going to use a famous example of factory design pattern which creates two stooges.

Stooge class implementation along with factory method is here

#include<iostream>
#include<vector>
#include<functional>
#include<algorithm>
class Stooge {
public:
	static Stooge *make_stooge(int);
	virtual void tell_a_joke()=0;
};

class Jerry : public Stooge{
public:
	void tell_a_joke(){
		std::cout << "I am jerry the stooge \n";
	}
};

class Moe : public Stooge{
public:
	void tell_a_joke(){
		std::cout << "I am Moe the stooge \n";
	}
};
Stooge *Stooge::make_stooge(int choice)
{
	if (choice == 1)
	{
		return new Jerry;
	}
	else
		return new Moe;
}

In summary a factory pattern takes out burden from “main” function (class consumer normally) to write all the “new” and stuff.

now the main

int main()
	std::vector<Stooge*> circus;
	int choice;
	cin >> choice;
	while (choice != 0)
	{
		cin >> choice;
		circus.push_back(Stooge::make_stooge(choice));
	}

	for each (Stooge* st in circus) st->tell_a_joke();
	cout << " \n Previous to that you have to use mem_fun... see implementation in source code\n";
	for_each(circus.begin(), circus.end(), mem_fun(&Stooge::tell_a_joke));
	cin >> choice;
}

So you can see how using for each looks somewhat comparable to other languages (like python)
The older way (“for_each”) has to provide the address of the static function which requires using “mem_fun” function which in my opinion is ok but error prone

Python gotcha

While browsing for some of the python “gotcha” I realized that everyone has the list gotcha but in my opinion that is always incomplete Consider this code

def gotcha(bad_list=[]):
bad_list.append(3)
print bad_list
def gotcha_again(bad_list=[]):
bad_list = bad_list + [3]
print bad_list

print "calling gotcha_again"
gotcha_again()
gotcha_again()
print "calling gotcha"
gotcha()
gotcha()

What happens when you call these two functions twice ?

If you have read the gotcha and applied it without understanding the reason, then you might expect both answers to be same.

However they would print :
calling gotcha_again
[3]
[3]
calling gotcha
[3]
[3, 3]

The reason has to with the inconsistency in the python language core( again my opinion). Python language is confused (although less than c++) b/w functional and OO and procedural language. Sometimes it would follow the “purity” concept of the functional languages and sometime it would completely violates it.

One other example of this conflict is

list.sort()

sorted(list)

Why do you need two behavior for sorting list ?

…. May be for creating an interview question ;)…

Why should anyone care about consistency.

…It creates “well behaved” product..
While developing a framework I realized this fact the hard way. If all the functions have similar behavior then the need to go through the documentation for each functions is reduced considerably. You can make valid and reasonable assumptions about even those parts of codes which you have not thoroughly studied.

Example.. now if someone is not expert in python then because of this sorting irregularity he/she can never be sure if any function on list would change the input list or would return some value. You have to confirm it always (Not that confirming is bad but still…)

 

Beginner-Intermediate Unix notes

chmod — Usage : For changing file security.

Background: User permissions are shown in Unix (using ls -l) as a 10 char string

‘-rwxrw-r–‘

[1-3-3-3]

1. The first character signify file type [‘-‘ here signifies file]

2. Next three tells about user’s permission [rwx here means read write and execute]

3. Next three are for permissions of member with same group as user [r– here means users group has read and write but NO execute]

4. Next is for all the other members. [r– signifies only read permissions]

Permissions can be changed in two ways using chmod

1. chmod [g/u/o] [+/- ][r/w/x]

Here g is for “group”, u is for “user”, o is for “other”, + is for adding and – is for removing, r(read), w(write), x(eXecute) permissions.

2. the other mode is more interesting and a bit confusing for beginners. It’s called absolute mode. There are two things to keep in mind here

a. Weight of r/w/x

b. Order of various categories or users

Weight is r –> 4  w–> 2 and x–>1

Order is fixed as u(User)g(Group)o(Other)

so

chmod 755 <filename> means

“give u(7 = 4 + 2 + 1 => read+write+execute) g(5= 4 + 1 means read+execute)o(same as group) permissions for the file with name <filename>”

————————————————————————————————————————————————

Common files that are being used while writing shell scripts

1. /dev/tty –> points to the default terminal via which the shell script is running

2. /dev/null –> used when you don’t care about the output.

————————————————————————————————————————————————-

Standard I/O , Standard Error and Redirection

Unix takes input from Standard Input , prints back to Standard output and throws error to Standard Error and unless specified otherwise all points to the terminal. With these assumptions it has to be very flexible in changing these three to create real world scripts. This is done via “pipes” and redirections.

“>” operator makes the file following it the new standard output

“<” operator makes the file following it the new standard input.

“&>” for redirecting standard error in csh.

The next feature is pipe, which redirects “flow” from one command to another

The general syntax for using pipe is :

command [Arguments] | nextCommand | nextCommandAgain

so the output from the first command becomes the input to the next command which in turns can generate input to some other command.

————————————————————————————————————————————————

Tee

Sometimes its desirable that the output from one particular subroutine flows to more than one place, this problem could be handled via Tee

Syntax would look something like

<command1> [Arguments] Tee command2 | command3

so Tee would pass the output of “command1” to both “command2” and “command3”

————————————————————————————————————————————————

Export command “To create shell variables “persistent” ”

In real world the files are deep inside some directory structure example “\home\tanmay\dev\quanlib\core\maths\numerics\pdesolver.hs”

Would be cool if I can alias such long path or even better automate the whole changing directory to the level by calling say “goNumerics”

Also some cool applications/ ide uses some global variable to provide including or extra directories. [Erlang uses ERL_LIBS], then it’s probably a good idea to declare them always.

That’s where export command helps.

exporting in your “.profile” or “.bash” or “.bash_rc” file would make sure that whenever you start a new “bash” shell you get all these cool stuff/variable available for you.

———————————————————————————————————————————————–

Some important command-line characters

1. Background process (‘&’) –> This is one of the most used (by me ) character. Some applications (emacs for example) would always keep itself attached to the terminal in which it is running. Which makes the terminal unusable for other activities unless you exit emacs. Which is annoying if you are in deep directory/file and might want to run some some commands after some edits (compile for example).So solution is to start emacs with an ampersand(‘&’) which would makes it run in background

2. Command Grouping  [()] –> you want the first line of you file to be current timestamp and then following by the output of you shell script one way is to write it like

date >> output.txt

mycommand [args] >> output.txt

other way is to use command grouping as (date; mycommand [args]) >>output.txt

btw ‘>>’ is used for append rather then delete and create the file on the right side.

————————————————————————————————————————————————-

While automating your build, or may be some other task that you achieve using the shell script, there are two kind of actions that you would probably want to do. (In my opinion these two actions are exhaustive)

1. Run <command1> and then after it runs successfully run <command2> and so on

2. Run <command1> and then run <command2> and so on.

These two effects can be achieved by using conditional executions

&& for 1st case

and

|| for the second.

<command1> && <command 2>

<command1> || <command 2>

 

 

 

Funtional programming with Scala

My first impression of scala is that it’s too verbose. Thing that could be done in may be 1 line in haskell and 2 lines in erlang takes around 5-10 lines(Might not look like a big difference for 1 line but production codes easily span thousands of line even with the most expressive languages and here the factor is of 5). Anyways, I am still new so might not be my place to judge it.

Here goes implementation of some functions / practice problems in scala.

1. Pascal triangle such that pascal(r,c) such function return value at rth row and cth column

a. Haskel version used ZipWith [Signature ((A->A->A) -> [A] -> [A] -> [A]) which is not in scala so I wrote my own version.

(I don’t like the fact that I was not able to pattern match the arguments to map.. to do –> Find a way to pattern match the tuple)

def zipWith[A] (f: (A,A) => A, l1: List[A], l2: List[A]) : List [A] = {

val zipList = l1.zip(l2)

zipList.map(x=>f(x._1,x._2))

}

———————– Now goes my implementation of the above function —————————-

def pascal(c: Int, r: Int) : Int  {

def pascalH(pr: List[Int], currentRow :Int): List [Int] =

{

if (r==currentRow) pr else

{

val l1  = pr :+ 0

val l2 = List(0) ++ pr

val next_list = zipWith((x:Int,y:Int)=> x+y), l1,l2)

pascalH(next_list,cr+1)

}

val thePascalRow = pascalH(List(1),0)

thePascalRow(c)

}

—– Note that the helper definition works here because scala (as other functional languages) has lexical closure..

I am still new to scala to could not figure out how to use + instead of that lambda in the zipWith call — if you know please leave a comment

Debugging/Reverse engineering Linux shell script.

(Intended for someone[me] who don’t faint on sight of some assembler code and can look for relevant information from the dump of a binary).

(In draft)

1.Debugging

using option vx for debugging shell script might be a good idea when you can not find anything wrong by the visual inspection

[howTo]

say you have a shellscript called DoGreatStuff

called with parameters me you they $4 etc

call it as : set -vf DoGreatStuff me you they $4 etc

Example run for cd for list in my computer is

set -vx ls $1
set -vx ls $1
__vte_prompt_command
++ __vte_prompt_command
__vte_osc7)”
__vte_osc7)
__vte_osc7
+++ __vte_osc7
__vte_urlencode “${PWD}”)”
__vte_urlencode “${PWD}”)
__vte_urlencode “${PWD}”
++++ __vte_urlencode /home/tanmay
++++ LANG=C
++++ str=/home/tanmay
++++ ‘[‘ -n /home/tanmay ‘]’
++++ safe=/home/tanmay
++++ printf %s /home/tanmay
++++ str=
++++ ‘[‘ -n ” ‘]’
++++ ‘[‘ -n ” ‘]’
+++ printf ’33]7;file://%s%s\a’ tanmay /home/tanmay
++ printf ’33]0;%s@%s:%s07%s’ tanmay tanmay ‘~’ ”

Ignore other things but you can see that ls is a C program.. (Not a good example here but I hope explains the point here)