Sunday, March 25, 2012

Undirected Graph

The following small program illustrate how to implement a undirected graph with costs to neighbor nodes.


#ifndef _GRAPH_HPP_
#define _GRAPH_HPP_

#include <iostream>
#include <map>
#include <list>

using namespace std;

/*
* Undirected graph class
*/
class Graph {
public:
Graph(unsigned int id=0) { m_id = id; m_Ecount = 0; }
virtual ~Graph();
void SetId(unsigned int id) { m_id = id; }
bool Match(Graph& g);
bool AddAdjacent(Graph* g, int cost=0);
bool AddAdjacentOne(Graph *g);
list GetAdjacents() { return m_Adjacents; }

int GetVCount() { return m_Adjacents.size(); }
int GetECount() { return m_Ecount; }
unsigned int GetId() { return m_id; }
void SetCostTo(Graph *g, int cost);
int GetCostTo(Graph* g);

private:
unsigned int m_id;
int m_Ecount;
map<Graph*,int> m_cost;
list<Graph*> m_Adjacents;
};


#endif




graph.pp:

#include "graph.hpp"

using namespace std;

Graph::~Graph()
{
// TODO:
m_Adjacents.clear();
}


bool Graph::Match(Graph& g)
{
cout << "this=" << this << endl;
cout << "&g =" << &g << endl;
if ((this == &g) || (g.GetId() == this->GetId()) &&
(g.GetECount() == this->GetECount()) &&
(g.GetVCount() == this->GetVCount()) )
return true;
return false;
}

bool Graph::AddAdjacentOne(Graph *g)
{
m_Adjacents.push_back(g);
//TODO: we need to tell g to add also edge to this object
m_Ecount++;
return true;
}


void Graph::SetCostTo(Graph *g, int cost)
{
if (g == NULL)
return;
// Use hash-table to set the cost from this node to g
m_cost[g] = cost;
}

int Graph::GetCostTo(Graph *g)
{
if (g == NULL)
return -1;
// use hash-table to get the cost from this node to g
return m_cost[g];
}

bool Graph::AddAdjacent(Graph* g, int cost)
{
if (!g) return false;

if (Match(*g))
{
cout << "ERROR: Tried to add itself" << endl;
return false;
}
else {
AddAdjacentOne(g);
// tell other node to set its adjacent to this node too
g->AddAdjacentOne(this);
// add cost from this object to g
g->SetCostTo(this, cost);
SetCostTo(g, cost);
return true;
}
}



int main()
{
Graph g[10];
int i;
int numOfVertices = 4;

for(i=0; i<numOfVertices; i++)
g[i].SetId(i);

// g0 --- g1
g[0].AddAdjacent(&g[1]);
//g1 --- g2
g[1].AddAdjacent(&g[2]);
// g2 --- g3
g[2].AddAdjacent(&g[3]);
// g3 --- g0
g[3].AddAdjacent(&g[0]);
// g0 -- g2
g[0].AddAdjacent(&g[2], 5);

list<Graph*>::iterator it;
list<Graph*> adjacents;

for(i=0; i<numOfVertices; i++)
{
cout << "NODE g[" << i << "]" << endl;
cout << "\tNumber of Edges in G[" << i << "] = " << g[i].GetECount() << endl;

adjacents = g[i].GetAdjacents();
cout << "\tNeighbors:" << endl;
for ( it=adjacents.begin() ; it != adjacents.end(); it++ )
cout << "\t\tNode=" << *it << ", g[" << (*it)->GetId() << "], cost="
<< g[i].GetCostTo(*it) << endl;
}
}

Friday, March 23, 2012

MIVO.TV Network Analysis

While my browser was playing the video streaming (tru flash-plugin) from mivo.tv,  a "netstat -t" revealed high traffic as below:


tcp        0      0 192.168.0.29:56448          68.68.29.24.:macromedia-fcs ESTABLISHED

After googling this "macromedia-fcs" I found:

TCP & UDP Port 1935 (Macromedia-fcs)



Technical description for port 1935:
The RTMP (Real Time Messaging Protocol) service associated with the computer port 1935 is a plain protocol utilized by the Adobe Macromedia Flash application. This proprietary service allows for the streaming of data, video and audio using an active Internet connection among a Flash Server and its associated player. This port supports the three variations of the RTMP service.

The communication port 1935 along with the TCP protocol is used for the delivery of encapsulated RTMPT that traverses firewall applications using HTTPS (secured) connections. The design of this protocol was intended to provide a persistent service for the Flash technology along with other programs like Adobe LiveCycle Data Services ES.

The raw TCP RTMP protocol uses one persistent connection for allowing the implementation of real time communication and smooth delivery of multimedia contents.

The protocol running on the port 1935 achieves this quality of transmission without affecting its ability to send bigger chunks of information including the fragmentation of data.


Digging deeper to the HTML code, I found a port in Javascript like this:


<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000"
     <codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" 
               width="610" height="878" id="MivoTV" align="top">'
         <param name="wmode" value="transparent">
         <param name="allowScriptAccess" value="always" />
         <param name="allowFullScreen" value="true" />
         <param name="SeamlessTabbing" value="false"/>
         <param name="movie" value="http://mivotv-static.com/swf/MivoTV.swf?r=99123"/>
         <param name="quality" value="high" />
         <param name="scale" value="noscale" />
         <param name="menu" value="true" />
         <param name="devicefont" value="false" />
         <param name="bgcolor" value="#ffffff" />
         <param name="name" value="MivoTV" />
         <embed src="http://mivotv-static.com/swf/MivoTV.swf?r=99999" quality="high" scale="noscale" bgcolor="#ffffff" width="610" height="878" name="MivoTV" align="top" allowScriptAccess="always" allowFullScreen="true" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" wmode="transparent" menu="true" devicefont="false" />
         
<font style="white-space: pre-wrap;"><font face="inherit">But why it doesn't work if I open the URL directly and put some random number at the end [0..99999]?</font></font></p>
<p>


<font style="white-space: pre-wrap;"><font face="inherit"><br></font></font></p>
</object></pre>



'
























But why it doesn't work if I open the URL directly and put some random number at the end [0..99999]?







Hash template in C++

 

#include <map>
#include <iostream>
#include <cstring>

using namespace std;

struct eqstr {
bool operator() (const char *s1, const char *s2) const {
return (::strcmp(s1, s2) < 0 ? true : false);
}
};


int main()
{
// template
map <const char *, int, eqstr> months;

months["january"] = 31;
months["february"] = 28;
months["march"] = 31;
months["april"] = 30;
months["may"] = 31;
months["june"] = 30;
months["july"] = 31;
months["august"] = 31;
months["september"] = 30;
months["october"] = 31;
months["november"] = 30;
months["december"] = 31;

cout << "february -> " << months["february"] << endl;
cout << "september -> " << months["september"] << endl;
cout << "april -> " << months["april"] << endl;
cout << "july -> " << months["july"] << endl;
cout << "november -> " << months["november"] << endl;

for(map::iterator it=months.begin();
it != months.end(); it++)
{
cout << (*it).first << " => " << (*it).second << endl;
}
}

Wednesday, March 21, 2012

ELF File Format

The C++ source code below:


#include <cstdio>
#include <iostream>

using namespace std;

class CParent {
public:
CParent() {
ref++;
cout << "class CParent: constructor ref=" << ref << endl;
Function();
};
virtual ~CParent() {
ref--;
cout << "class CParent: destructor ref=" << ref << endl;
}
virtual int Function() {
cout << "\tA::Function()" << endl;
return ref;
}
static unsigned int ref;
};


class CChild : public CParent {
public:
CChild() {
ref++;
cout << "\tclass CChild: constructor ref=" << ref << endl;
m_a = new CParent();
};
virtual ~CChild() {
ref--;
cout << "\tclass CChild: destructor ref=" << ref << endl;
delete m_a;
}

static unsigned int ref;
private:
CParent *m_a;
};


unsigned int CParent::ref = 0;
unsigned int CChild::ref = 0;

int main()
{
int i;
int n = 2;

CChild *c = new CChild[n];
for(i=0; i(&c[i]);
printf("p[%d].Function()=%d\n", i, p->Function());
}
cout << "CChild::Ref=" << CChild::ref << endl;
delete [] c;
}

A in-depth view of the generated ELF file format from the above C++ code is as below:


[buhadram@localhost src]$ readelf -a virt
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x80487d0
Start of program headers: 52 (bytes into file)
Start of section headers: 5192 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 8
Size of section headers: 40 (bytes)
Number of section headers: 31
Section header string table index: 28

Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048134 000134 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 08048148 000148 000020 00 A 0 0 4
[ 3] .note.gnu.build-i NOTE 08048168 000168 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0804818c 00018c 000040 04 A 5 0 4
[ 5] .dynsym DYNSYM 080481cc 0001cc 000160 10 A 6 1 4
[ 6] .dynstr STRTAB 0804832c 00032c 000214 00 A 0 0 1
[ 7] .gnu.version VERSYM 08048540 000540 00002c 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 0804856c 00056c 000080 00 A 6 3 4
[ 9] .rel.dyn REL 080485ec 0005ec 000020 08 A 5 0 4
[10] .rel.plt REL 0804860c 00060c 000080 08 A 5 12 4
[11] .init PROGBITS 0804868c 00068c 000030 00 AX 0 0 4
[12] .plt PROGBITS 080486bc 0006bc 000110 04 AX 0 0 4
[13] .text PROGBITS 080487d0 0007d0 0005dc 00 AX 0 0 16
[14] .fini PROGBITS 08048dac 000dac 00001c 00 AX 0 0 4
[15] .rodata PROGBITS 08048dc8 000dc8 000114 00 A 0 0 8
[16] .eh_frame_hdr PROGBITS 08048edc 000edc 000074 00 A 0 0 4
[17] .eh_frame PROGBITS 08048f50 000f50 00022c 00 A 0 0 4
[18] .gcc_except_table PROGBITS 0804917c 00117c 000041 00 A 0 0 1
[19] .ctors PROGBITS 0804a1c0 0011c0 00000c 00 WA 0 0 4
[20] .dtors PROGBITS 0804a1cc 0011cc 000008 00 WA 0 0 4
[21] .jcr PROGBITS 0804a1d4 0011d4 000004 00 WA 0 0 4
[22] .dynamic DYNAMIC 0804a1d8 0011d8 0000e0 08 WA 6 0 4
[23] .got PROGBITS 0804a2b8 0012b8 000004 04 WA 0 0 4
[24] .got.plt PROGBITS 0804a2bc 0012bc 00004c 04 WA 0 0 4
[25] .data PROGBITS 0804a308 001308 000004 00 WA 0 0 4
[26] .bss NOBITS 0804a320 00130c 000120 00 WA 0 0 32
[27] .comment PROGBITS 00000000 00130c 00002c 01 MS 0 0 1
[28] .shstrtab STRTAB 00000000 001338 00010e 00 0 0 1
[29] .symtab SYMTAB 00000000 001920 000680 10 30 49 4
[30] .strtab STRTAB 00000000 001fa0 0005a2 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x08048134 0x08048134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x011bd 0x011bd R E 0x1000
LOAD 0x0011c0 0x0804a1c0 0x0804a1c0 0x0014c 0x00280 RW 0x1000
DYNAMIC 0x0011d8 0x0804a1d8 0x0804a1d8 0x000e0 0x000e0 RW 0x4
NOTE 0x000148 0x08048148 0x08048148 0x00044 0x00044 R 0x4
GNU_EH_FRAME 0x000edc 0x08048edc 0x08048edc 0x00074 0x00074 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4

Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame .gcc_except_table
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07

Dynamic section at offset 0x11d8 contains 23 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libstdc++.so.6]
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x0000000c (INIT) 0x804868c
0x0000000d (FINI) 0x8048dac
0x6ffffef5 (GNU_HASH) 0x804818c
0x00000005 (STRTAB) 0x804832c
0x00000006 (SYMTAB) 0x80481cc
0x0000000a (STRSZ) 532 (bytes)
0x0000000b (SYMENT) 16 (bytes)
0x00000015 (DEBUG) 0x0
0x00000003 (PLTGOT) 0x804a2bc
0x00000002 (PLTRELSZ) 128 (bytes)
0x00000014 (PLTREL) REL
0x00000017 (JMPREL) 0x804860c
0x00000011 (REL) 0x80485ec
0x00000012 (RELSZ) 32 (bytes)
0x00000013 (RELENT) 8 (bytes)
0x6ffffffe (VERNEED) 0x804856c
0x6fffffff (VERNEEDNUM) 3
0x6ffffff0 (VERSYM) 0x8048540
0x00000000 (NULL) 0x0

Relocation section '.rel.dyn' at offset 0x5ec contains 4 entries:
Offset Info Type Sym.Value Sym. Name
0804a2b8 00000206 R_386_GLOB_DAT 00000000 __gmon_start__
0804a320 00000f05 R_386_COPY 0804a320 _ZTVN10__cxxabiv117__c
0804a360 00001405 R_386_COPY 0804a360 _ZSt4cout
0804a400 00001005 R_386_COPY 0804a400 _ZTVN10__cxxabiv120__s

Relocation section '.rel.plt' at offset 0x60c contains 16 entries:
Offset Info Type Sym.Value Sym. Name
0804a2c8 00000107 R_386_JUMP_SLOT 00000000 __cxa_atexit
0804a2cc 00000207 R_386_JUMP_SLOT 00000000 __gmon_start__
0804a2d0 00000407 R_386_JUMP_SLOT 00000000 _ZdlPv
0804a2d4 00000507 R_386_JUMP_SLOT 00000000 _ZNSt8ios_base4InitC1E
0804a2d8 00000607 R_386_JUMP_SLOT 00000000 __libc_start_main
0804a2dc 00001307 R_386_JUMP_SLOT 0804871c _ZNSt8ios_base4InitD1E
0804a2e0 00000707 R_386_JUMP_SLOT 00000000 _ZStlsISt11char_traits
0804a2e4 00000807 R_386_JUMP_SLOT 00000000 printf
0804a2e8 00000907 R_386_JUMP_SLOT 00000000 _ZNSolsEj
0804a2ec 00000a07 R_386_JUMP_SLOT 00000000 _Znwj
0804a2f0 00000b07 R_386_JUMP_SLOT 00000000 _Znaj
0804a2f4 00000c07 R_386_JUMP_SLOT 00000000 _ZdaPv
0804a2f8 00000d07 R_386_JUMP_SLOT 00000000 _ZNSolsEPFRSoS_E
0804a2fc 00001207 R_386_JUMP_SLOT 0804879c _ZSt4endlIcSt11char_tr
0804a300 00001507 R_386_JUMP_SLOT 080487ac __gxx_personality_v0
0804a304 00000e07 R_386_JUMP_SLOT 00000000 _Unwind_Resume

There are no unwind sections in this file.

Symbol table '.dynsym' contains 22 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FUNC GLOBAL DEFAULT UND __cxa_atexit@GLIBC_2.1.3 (2)
2: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
4: 00000000 0 FUNC GLOBAL DEFAULT UND _ZdlPv@GLIBCXX_3.4 (3)
5: 00000000 0 FUNC GLOBAL DEFAULT UND _ZNSt8ios_base4InitC1Ev@GLIBCXX_3.4 (3)
6: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.0 (4)
7: 00000000 0 FUNC GLOBAL DEFAULT UND _ZStlsISt11char_traitsIcE@GLIBCXX_3.4 (3)
8: 00000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.0 (4)
9: 00000000 0 FUNC GLOBAL DEFAULT UND _ZNSolsEj@GLIBCXX_3.4 (3)
10: 00000000 0 FUNC GLOBAL DEFAULT UND _Znwj@GLIBCXX_3.4 (3)
11: 00000000 0 FUNC GLOBAL DEFAULT UND _Znaj@GLIBCXX_3.4 (3)
12: 00000000 0 FUNC GLOBAL DEFAULT UND _ZdaPv@GLIBCXX_3.4 (3)
13: 00000000 0 FUNC GLOBAL DEFAULT UND _ZNSolsEPFRSoS_E@GLIBCXX_3.4 (3)
14: 00000000 0 FUNC GLOBAL DEFAULT UND _Unwind_Resume@GCC_3.0 (6)
15: 0804a320 44 OBJECT WEAK DEFAULT 26 _ZTVN10__cxxabiv117__clas@CXXABI_1.3 (5)
16: 0804a400 44 OBJECT WEAK DEFAULT 26 _ZTVN10__cxxabiv120__si_c@CXXABI_1.3 (5)
17: 08048dcc 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used
18: 0804879c 0 FUNC GLOBAL DEFAULT UND _ZSt4endlIcSt11char_trait@GLIBCXX_3.4 (3)
19: 0804871c 0 FUNC GLOBAL DEFAULT UND _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 (3)
20: 0804a360 140 OBJECT GLOBAL DEFAULT 26 _ZSt4cout@GLIBCXX_3.4 (3)
21: 080487ac 0 FUNC GLOBAL DEFAULT UND __gxx_personality_v0@CXXABI_1.3 (5)

Symbol table '.symtab' contains 104 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 08048134 0 SECTION LOCAL DEFAULT 1
2: 08048148 0 SECTION LOCAL DEFAULT 2
3: 08048168 0 SECTION LOCAL DEFAULT 3
4: 0804818c 0 SECTION LOCAL DEFAULT 4
5: 080481cc 0 SECTION LOCAL DEFAULT 5
6: 0804832c 0 SECTION LOCAL DEFAULT 6
7: 08048540 0 SECTION LOCAL DEFAULT 7
8: 0804856c 0 SECTION LOCAL DEFAULT 8
9: 080485ec 0 SECTION LOCAL DEFAULT 9
10: 0804860c 0 SECTION LOCAL DEFAULT 10
11: 0804868c 0 SECTION LOCAL DEFAULT 11
12: 080486bc 0 SECTION LOCAL DEFAULT 12
13: 080487d0 0 SECTION LOCAL DEFAULT 13
14: 08048dac 0 SECTION LOCAL DEFAULT 14
15: 08048dc8 0 SECTION LOCAL DEFAULT 15
16: 08048edc 0 SECTION LOCAL DEFAULT 16
17: 08048f50 0 SECTION LOCAL DEFAULT 17
18: 0804917c 0 SECTION LOCAL DEFAULT 18
19: 0804a1c0 0 SECTION LOCAL DEFAULT 19
20: 0804a1cc 0 SECTION LOCAL DEFAULT 20
21: 0804a1d4 0 SECTION LOCAL DEFAULT 21
22: 0804a1d8 0 SECTION LOCAL DEFAULT 22
23: 0804a2b8 0 SECTION LOCAL DEFAULT 23
24: 0804a2bc 0 SECTION LOCAL DEFAULT 24
25: 0804a308 0 SECTION LOCAL DEFAULT 25
26: 0804a320 0 SECTION LOCAL DEFAULT 26
27: 00000000 0 SECTION LOCAL DEFAULT 27
28: 00000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
29: 0804a1c0 0 OBJECT LOCAL DEFAULT 19 __CTOR_LIST__
30: 0804a1cc 0 OBJECT LOCAL DEFAULT 20 __DTOR_LIST__
31: 0804a1d4 0 OBJECT LOCAL DEFAULT 21 __JCR_LIST__
32: 08048800 0 FUNC LOCAL DEFAULT 13 __do_global_dtors_aux
33: 0804a42c 1 OBJECT LOCAL DEFAULT 26 completed.5530
34: 0804a430 4 OBJECT LOCAL DEFAULT 26 dtor_idx.5532
35: 08048860 0 FUNC LOCAL DEFAULT 13 frame_dummy
36: 00000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
37: 0804a1c8 0 OBJECT LOCAL DEFAULT 19 __CTOR_END__
38: 08049178 0 OBJECT LOCAL DEFAULT 17 __FRAME_END__
39: 0804a1d4 0 OBJECT LOCAL DEFAULT 21 __JCR_END__
40: 08048d80 0 FUNC LOCAL DEFAULT 13 __do_global_ctors_aux
41: 00000000 0 FILE LOCAL DEFAULT ABS virt.cpp
42: 0804a43c 1 OBJECT LOCAL DEFAULT 26 _ZStL8__ioinit
43: 08048a08 64 FUNC LOCAL DEFAULT 13 _Z41__static_initializati
44: 08048a48 28 FUNC LOCAL DEFAULT 13 _GLOBAL__I__ZN7CParent3re
45: 0804a2bc 0 OBJECT LOCAL DEFAULT 24 _GLOBAL_OFFSET_TABLE_
46: 0804a1bd 0 NOTYPE LOCAL DEFAULT 19 __init_array_end
47: 0804a1bd 0 NOTYPE LOCAL DEFAULT 19 __init_array_start
48: 0804a1d8 0 OBJECT LOCAL DEFAULT 22 _DYNAMIC
49: 0804a308 0 NOTYPE WEAK DEFAULT 25 data_start
50: 00000000 0 FUNC GLOBAL DEFAULT UND __cxa_atexit@@GLIBC_2.1.3
51: 08048d70 5 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
52: 080487d0 0 FUNC GLOBAL DEFAULT 13 _start
53: 08048ebc 12 OBJECT WEAK DEFAULT 15 _ZTI6CChild
54: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
55: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
56: 08048dc8 4 OBJECT GLOBAL DEFAULT 15 _fp_hw
57: 00000000 0 FUNC GLOBAL DEFAULT UND _ZdlPv@@GLIBCXX_3.4
58: 0804a434 4 OBJECT GLOBAL DEFAULT 26 _ZN7CParent3refE
59: 08048dac 0 FUNC GLOBAL DEFAULT 14 _fini
60: 00000000 0 FUNC GLOBAL DEFAULT UND _ZNSt8ios_base4InitC1Ev@@
61: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_
62: 08048e88 20 OBJECT WEAK DEFAULT 15 _ZTV6CChild
63: 08048b56 49 FUNC WEAK DEFAULT 13 _ZN7CParent8FunctionEv
64: 08048cea 30 FUNC WEAK DEFAULT 13 _ZN6CChildD0Ev
65: 08048b88 171 FUNC WEAK DEFAULT 13 _ZN6CChildC2Ev
66: 0804871c 0 FUNC GLOBAL DEFAULT UND _ZNSt8ios_base4InitD1Ev@@
67: 00000000 0 FUNC GLOBAL DEFAULT UND _ZStlsISt11char_traitsIcE
68: 08048dcc 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used
69: 0804a308 0 NOTYPE GLOBAL DEFAULT 25 __data_start
70: 0804a320 44 OBJECT WEAK DEFAULT 26 _ZTVN10__cxxabiv117__clas
71: 0804a360 140 OBJECT GLOBAL DEFAULT 26 _ZSt4cout@@GLIBCXX_3.4
72: 08048dd0 0 OBJECT GLOBAL HIDDEN 15 __dso_handle
73: 0804a1d0 0 OBJECT GLOBAL HIDDEN 20 __DTOR_END__
74: 08048d10 90 FUNC GLOBAL DEFAULT 13 __libc_csu_init
75: 00000000 0 FUNC GLOBAL DEFAULT UND printf@@GLIBC_2.0
76: 08048a64 100 FUNC WEAK DEFAULT 13 _ZN7CParentC2Ev
77: 00000000 0 FUNC GLOBAL DEFAULT UND _ZNSolsEj@@GLIBCXX_3.4
78: 08048ac8 112 FUNC WEAK DEFAULT 13 _ZN7CParentD1Ev
79: 00000000 0 FUNC GLOBAL DEFAULT UND _Znwj@@GLIBCXX_3.4
80: 00000000 0 FUNC GLOBAL DEFAULT UND _Znaj@@GLIBCXX_3.4
81: 08048a64 100 FUNC WEAK DEFAULT 13 _ZN7CParentC1Ev
82: 08048ed4 8 OBJECT WEAK DEFAULT 15 _ZTI7CParent
83: 08048b38 30 FUNC WEAK DEFAULT 13 _ZN7CParentD0Ev
84: 08048c34 182 FUNC WEAK DEFAULT 13 _ZN6CChildD2Ev
85: 0804a30c 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
86: 0804a400 44 OBJECT WEAK DEFAULT 26 _ZTVN10__cxxabiv120__si_c
87: 0804a438 4 OBJECT GLOBAL DEFAULT 26 _ZN6CChild3refE
88: 08048ec8 9 OBJECT WEAK DEFAULT 15 _ZTS7CParent
89: 08048ac8 112 FUNC WEAK DEFAULT 13 _ZN7CParentD2Ev
90: 08048eb4 8 OBJECT WEAK DEFAULT 15 _ZTS6CChild
91: 08048ea0 20 OBJECT WEAK DEFAULT 15 _ZTV7CParent
92: 00000000 0 FUNC GLOBAL DEFAULT UND _ZdaPv@@GLIBCXX_3.4
93: 0804a440 0 NOTYPE GLOBAL DEFAULT ABS _end
94: 00000000 0 FUNC GLOBAL DEFAULT UND _ZNSolsEPFRSoS_E@@GLIBCXX
95: 08048b88 171 FUNC WEAK DEFAULT 13 _ZN6CChildC1Ev
96: 0804879c 0 FUNC GLOBAL DEFAULT UND _ZSt4endlIcSt11char_trait
97: 0804a30c 0 NOTYPE GLOBAL DEFAULT ABS _edata
98: 08048c34 182 FUNC WEAK DEFAULT 13 _ZN6CChildD1Ev
99: 080487ac 0 FUNC GLOBAL DEFAULT UND __gxx_personality_v0@@CXX
100: 00000000 0 FUNC GLOBAL DEFAULT UND _Unwind_Resume@@GCC_3.0
101: 08048d75 0 FUNC GLOBAL HIDDEN 13 __i686.get_pc_thunk.bx
102: 08048884 388 FUNC GLOBAL DEFAULT 13 main
103: 0804868c 0 FUNC GLOBAL DEFAULT 11 _init

Histogram for `.gnu.hash' bucket list length (total of 3 buckets):
Length Number % of total Coverage
0 0 ( 0.0%)
1 0 ( 0.0%) 0.0%
2 2 ( 66.7%) 57.1%
3 1 ( 33.3%) 100.0%

Version symbols section '.gnu.version' contains 22 entries:
Addr: 0000000008048540 Offset: 0x000540 Link: 5 (.dynsym)
000: 0 (*local*) 2 (GLIBC_2.1.3) 0 (*local*) 0 (*local*)
004: 3 (GLIBCXX_3.4) 3 (GLIBCXX_3.4) 4 (GLIBC_2.0) 3 (GLIBCXX_3.4)
008: 4 (GLIBC_2.0) 3 (GLIBCXX_3.4) 3 (GLIBCXX_3.4) 3 (GLIBCXX_3.4)
00c: 3 (GLIBCXX_3.4) 3 (GLIBCXX_3.4) 6 (GCC_3.0) 5 (CXXABI_1.3)
010: 5 (CXXABI_1.3) 1 (*global*) 3 (GLIBCXX_3.4) 3 (GLIBCXX_3.4)
014: 3 (GLIBCXX_3.4) 5 (CXXABI_1.3)

Version needs section '.gnu.version_r' contains 3 entries:
Addr: 0x000000000804856c Offset: 0x00056c Link: 6 (.dynstr)
000000: Version: 1 File: libgcc_s.so.1 Cnt: 1
0x0010: Name: GCC_3.0 Flags: none Version: 6
0x0020: Version: 1 File: libstdc++.so.6 Cnt: 2
0x0030: Name: CXXABI_1.3 Flags: none Version: 5
0x0040: Name: GLIBCXX_3.4 Flags: none Version: 3
0x0050: Version: 1 File: libc.so.6 Cnt: 2
0x0060: Name: GLIBC_2.0 Flags: none Version: 4
0x0070: Name: GLIBC_2.1.3 Flags: none Version: 2

Notes at offset 0x00000148 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)

Notes at offset 0x00000168 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
[buhadram@localhost src]$

Friday, March 16, 2012

Google Interview (3)

Question:
How would you reverse the image on an n by n matrix where each pixel is represented by a bit?

Answer:
I: n x n matrix
I[0][0] represents 1 byte at the left-most, top-most corner (8 pixels).

Just image it as columns and rows of pixels.  For example, a line of an image is represented on the screen as columns of pixels:

+----+----+----+ ....+-----+-----+---+
| p1 | p2 | p3 | |pm-2 | pm-1| pm |
+----+----+----+ ....+-----+-----+---+

In reversed image, it looks like below:


+---+----+----+ ....+----+----+----+
|pm |pm-1 |pm-2| | p3 | p2 | p1 |
+---+----+----+ ....+----+----+----+


The reverse version would have the column position reversed as well as the bit order in each bit. To do this, we need to read each row of the original image in reverse and also do bit reversal.

We have to be careful when swapping bytes above.  On 32-bit computers, we can swap left-right of every 32-bit integer of columns, or on 64-bit PCs we can do 64-bit (long integer) swapping before doing the bit reverse.

For example:

n = m/(sizeof(long integer)
for i=0 to n
    right_idx = m-1-i;
    swapLongInteger(image[i], image[right_idx])
    Reverse8Bit(image[i])
    Reverse8Bit(image[i+1])
    Reverse8Bit(image[i+2]) 
    Reverse8Bit(image[i+3]) 
    Reverse8Bit(image[right_idx])
    Reverse8Bit(image[right_idx-1])
    Reverse8Bit(image[right_idx-2])
    Reverse8Bit(image[right_idx-3])
    i = i + sizeof(long integer)
done

The call to multiple "Reverse8Bit" above can be put in multithreading/can be done using parallelism to speed up computation.

The pseudo-code is as below:

# I : list original image
# R : reversed image
for i in range(0,n):
for j in range(0,n):
R[i][j] = Reverse8Bit(I[i][n-1-j])

The Reverse8Bit() function is like below (from http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious):

unsigned int v;     // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)
{
r <<= 1;
r |= v & 1;
s--;
}
r <<= s; // shift when v's highest bits are zero

Thursday, March 15, 2012

Google Interview (2)

Question:
How many degrees are there in the angle between the hour and minute hands of a clock when the time is a quarter past three?

Answer:
The stupid and quick answer would be 0 degree.

But, if we rethink again, it's not the case exactly.  When the minute  hand points to '15' (quarter), the hour hand has moved little bit.  How many? Let's analyze!

To make a full-circle (360 degrees rotation), the hour hand has to pass 12 hours, thus for one hour, it takes 360 degrees/12 = 30 degrees.  For a quarter hour = 1/4 * 30 = 7.5 degrees.

So the answer is: 7.5 degrees.

A Google Interview

Question:
If a person dials a sequence of numbers on the telephone, what possible words/strings can be formed from the letters associated with those numbers?

Answer:
Let's analyze this.

On a handset, there are keys with number '0' to '9'.  Assume we have to key in full phone number (area code + number),  it's gonna be 10 digits.  Only certain keys have alphabets: 2 [ABC], 3 [DEF], 4 [GHI], 5 [JKL], 6 [MNO], 7 [PQRS], 8 [WXYZ], with total number of sets = 8. The question is to find what is the total number of permutation of these 10-digits out of n set of possible phone numbers

Using permutation formula:


Formula:
 Note: , where nPr is the formula for permutations of objects taken r at a time.


The set of possible numbers then is: [222-222-2222 to 888-888-8888] or 6,666,666,666 possible phone numbers.

We then compute:
n = 10,000,000,000
r = 10

Permutation =6,666,666,666 !/(10!(6,666,666,666  - 10)!) = (6,666,666,656 * 6,666,666,657 * 6,666,666,658 *6,666,666,669 * 6,666,666,660 * 6,666,666,661 * 6,666,666,662 * 6,666,666,663 * 6,666,666,664 * 6,666,666,665 * 6,666,666,666)/3628800 = <just figure out your self!>

If the allowed valid number is only 1 digit, then computation would be easier:

Permutation = 6,666,666,666!/(1!*(6,666,666,666-1)!) = 6,666,666,666 possible words


Fast Hashing of Variable-length text strings

The following Hash algorithm based on a research paper written by Peter Pearson (a scientist at Lawrence Livermore National Lab) has an ability to generate unique hash numbers for strings with lengths no longer than 256 bytes. That means, we can have up to 256 buckets in a hash table, but it guarantees the uniqueness (free of hash-collision), thus no need to have linked-list in buckets.

The beauty of this algorithm is that it is designed to work on small microprocessors or microcontrollers, which sometimes don't have luxury to do complex arithmetics in their Instruction set (or, require more instructions).  Instead, the following algorithm uses XOR and table lookup which should be easy to do in small CPUs.  On more modern and high-end CPUs, XOR operation translated directly to mnemonic XOR in machine-code (a single instruction) and done even in almost wire-speed.

One of the use I can think of is for creating MAC table for virtual bridging.  An Ethernet MAC address takes only 6 octets, so this algorithm can be used to lookup 256-bucket MAC table.  The hash-key is MACaddr, while the output is port.

For example:

A MAC table structure might look like below:
+-------------------+---------+------+
| MACAddr + port | Age |
+-------------------+---------+------+
| XX:XX:XX:XX:XX:XX | 1 | 6 |
| YY:YY:YY:YY:YY:YY | 2 | 7 |
| ZZ:ZZ:ZZ:ZZ:ZZ:ZZ | 3 | 102 |
| AA:BB:BB:BB:BB:BB | 4 | 33 |
| AA:AA:AA:AA:AA:AA | 5 | 76 |
+-------------------+---------+------+

Learning:
port = PearsonHash(SourceMacAddr);
MACTable[port].addr = SourceMacAddr;
MACTable[port].port = port;

Forwarding:
inspect destination MAC address from incoming ethernet frame

//Flooding
if (DestMacAddr == Broadcast)
      Send to all ports, except the original incoming port, with this frame

// a small filtering is done here to prevent loop
if (DestMacAddr != SourceMacAddr)
{
     port = PearsonHash(DestMacAddr);
     if port is invalid:
          flooding
     else
     DataForward(EthernetFrame, port);
}

Aging:
The following statements can be executed periodically.

for each entry/bucket in the MAC table:
      if MACTable[i].age < 1:
            Flush(MACTable[i])

A challenge might be how to expand this algorithm to handle, say, 64000 buckets/entries?  How to generate the pseudorandom hash table?

#include <stdio.h>
#include <string.h>
#include <stdlib.h>


static unsigned char PseudoRandomHash[256] = {
1, 87, 49, 12, 176, 178, 102, 166, 121, 193, 6, 84, 249, 230, 44, 163,
14, 197, 213, 181, 161, 85, 218, 80, 64, 239, 24, 226, 236, 142, 38, 200,
110, 177, 104, 103, 141, 253, 255, 50, 77, 101, 81, 18, 45, 96, 31, 222,
25, 107, 190, 70, 86, 237, 240, 34, 72, 242, 20, 214, 244, 227, 149, 235,
97, 234, 57, 22, 60, 250, 82, 175, 208, 5, 127, 199, 111, 62, 135, 248,
174, 169, 211, 58, 66, 154, 106, 195, 245, 171, 17, 187, 182, 179, 0, 243,
132, 56, 148, 75, 128, 133, 158, 100, 130, 126, 91, 13, 153, 246, 216, 219,
119, 68, 223, 78, 83, 88, 201, 99, 122, 11, 92, 32, 136, 114, 52, 10,
138, 30, 48, 183, 156, 35, 61, 26, 143, 74, 251, 94, 129, 162, 63, 152,
170, 7, 115, 167, 241, 206, 3, 150, 55, 59, 151, 220, 90, 53, 23, 131,
125, 173, 15, 238, 79, 95, 89, 16, 105, 137, 225, 224, 217, 160, 37, 123,
118, 73, 2, 157, 46, 116, 9, 145, 134, 228, 207, 212, 202, 215, 69, 229,
27, 188, 67, 124, 168, 252, 42, 4, 29, 108, 21, 247, 19, 205, 39, 203,
233, 40, 186, 147, 198, 192, 155, 33, 164, 191, 98, 204, 165, 180, 117, 76,
140, 36, 210, 172, 41, 54, 159, 8, 185, 232, 113, 196, 231, 47, 146, 120,
51, 65, 28, 144, 254, 221, 93, 189, 194, 139, 112, 43, 71, 109, 184, 209};

#define N 256


int PearsonHash(const char *str)
{
int n = strlen(str);
int i;
unsigned char *h;
int result;

if (n > N)
{
puts("Pearson Hashing can only be used for string length <256 characters");
return -1;
}
h = (unsigned char *)malloc(N);
bzero(h, sizeof(char)*N);
h[0] = 0;
for(i=1; i<n; i++)
{
h[i] = PseudoRandomHash[(h[i-1] ^ str[i])& 0xFF];
//printf("h[%u]=%0x\n", i, h[i]);
result = h[i];
}
free(h);
return result;
}


int main()
{
int i;
int n;
const char MyStringBase[] = "Ahlan Wa Sahlan Bib, ";
char *StrTable[N];

for(i=0; i<N; i++)
{
printf(" %03u ", PseudoRandomHash[i]);
if ((i+1)%16 == 0)
printf("\n");
}
for(i=0; i<N; i++)
{
StrTable[i] = (char *)malloc(N);
sprintf(StrTable[i], "%s%u", MyStringBase, i);
}
for(i=0; i<N; i++)
{
printf("\n%s: \tHash Index=%u\n", StrTable[i], PearsonHash(StrTable[i]));
free(StrTable[i]);
}
}

Wednesday, March 14, 2012

Print Canonical Addresses of a Site

This 2 lines of Python queries DNS and display the canonical names of a host.  It's like a simple DNS lookup (nslookup).

#!/usr/bin/python

import sys
import socket

HOST=''
PORT=80

if (len(sys.argv) < 2):
print sys.argv[0], " "
sys.exit(0)

HOST = sys.argv[1]
for res in socket.getaddrinfo(HOST, PORT, socket.AF_UNSPEC, socket.SOCK_STREAM):
print res[4][0]



For example, if we execute the script and pass parameter 'www.google.com', we'd get this:

$ ./getdns.py www.google.com
74.125.224.50
74.125.224.51
74.125.224.48
74.125.224.49
74.125.224.52


Python GUI

This small Python script creates a window application and handles menu events as well as updating statusbar at the bottom.

#!/usr/bin/python

from Tkinter import *
import tkMessageBox

class StatusBar(Frame):

def __init__(self, master):
Frame.__init__(self, master)
self.label = Label(self, text="", bd=1, relief=SUNKEN, anchor=W)
self.label.pack(side=BOTTOM, fill=X)

def set(self, format, *args):
self.label.config(text=format % args)
self.label.update_idletasks()

def clear(self):
self.label.config(text="")
self.label.update_idletasks()

class App:

def __init__(self, parent):
self.this = self
frame = Frame(parent)
parent.protocol("WM_DELETE_WINDOW", ConfirmQuit)

parent.geometry("%dx%d%+d%+d" % (600, 400, 100, 50))

# create a menu
menu = Menu(parent)
parent.config(menu=menu)

filemenu = Menu(menu)
menu.add_cascade(label="File", menu=filemenu)
filemenu.add_command(label="New", command=self.newFile)
filemenu.add_command(label="Open...", command=self.OpenFile)
filemenu.add_separator()
filemenu.add_command(label="Exit", command=self.Exit)

runmenu = Menu(menu)
menu.add_cascade(label="Run", menu=runmenu)
runmenu.add_command(label="Run now", command=self.callback)

helpmenu = Menu(menu)
menu.add_cascade(label="Help", menu=helpmenu)
helpmenu.add_command(label="About...", command=self.About)

def callback(event):
print "callback was called from event class ",event.__class__
print "event.__name__=",__name__
print "event type:", type(event)
print "event.this.__class__=", event.this.__class__
print "this: type=", type(event.this), ",class=", event.this.__class__
print "status_bar", status_bar.__class__
status_bar.set("%s", "Ahlan bib")

def newFile(event):
print "New file", event
status_bar.set("%s", "Creating a new file")

def OpenFile(event):
print "Open File", event
status_bar.set("%s", "Opening a file")

def About(event):
print "(c) 2012, mlutfi", event
status_bar.clear()

def say_hi(self):
print "hi there, everyone!"

def Exit(event):
ConfirmQuit()

class BaseWindow:
def __init__(self, parent):
Toplevel()


def ConfirmQuit():
if tkMessageBox.askokcancel("Quit", "Do you really want to quit?"):
root.destroy()

root = Tk()
root.protocol("WM_DELETE_WINDOW", ConfirmQuit)

status_bar = StatusBar(root)
status_bar.pack(side=BOTTOM, fill=X)

app = App(root)

root.mainloop()

Sunday, March 11, 2012

Fibonacci in Python

# Fibonacci numbers module

def fib(n): # write Fibonacci series up to n
a, b = 0, 1
while b < n:
#print b,
a, b = b, a+b

def fib2(n): # return Fibonacci series up to n
result = []
a, b = 0, 1
while b < n:
result.append(b)
a, b = b, a+b
print b
return result

# make it executable as a script as well as module_exit
if __name__ == "__main__":
import sys
if (len(sys.argv) > 1):
f = fib2(int(sys.argv[1]))
str = ""
print "Fibonacci sequence: ", f
else:
print sys.argv[0], " "

Downloading book "Foundations of Computer Science"

This small python script would download all chapters in the book "Foundations of Computer Science".






#!/usr/bin/python

import sys
from subprocess import call # for 'call'


urlbase = 'http://infolab.stanford.edu/~ullman/focs/'

others = ['preface.pdf', 'toc.pdf', 'index.pdf']
chapters = range(1,15)
links = []
links.extend(others)

def download(index, f):
url = urlbase + f
print "file = ", index+1, url
call(["wget", url])


for ch in chapters:
links.append('ch' + '%02d' %(ch) + '.pdf')

out = [download(index,obj) for index,obj in enumerate(links)]

print "out=", out


Regular Expression in Python

The following is a Python snippet to do regular expression. First it reads a file passed as an argument, the it goes to a for-loop for each line and do string regular-expression search
#!/usr/bin/python

import sys # for sys.argv, etc.
import re # regular expression

if (len(sys.argv) > 1):
filename = sys.argv[1]
else:
print sys.argv[0], " "
sys.exit(0)


with open(filename, 'r') as f:
#f.read()
for line in f:
pattern = re.compile(r"print (.*)")
match = pattern.search(line)
if match:
print "print is used to print '", match.group(1), "'"

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ readfile.py 17,3-9 All "readfile.py" 21L, 386C

Computer Design Fun Party!

  1. To compute expression A = (B-C)*(D-E)


Using accumulator:

Load B ; acc = B; opcode=1 B, oper=2B; mem.traf = 2 B
Sub C ; acc = B – C; 1 byte opcode, 2-bytes operand = 3 bytes
Store A ; A = B -C; 1 byte opcode, 2-bytes operand = 3 bytes
Load D ; acc = D; 1 byte opcode, 2-bytes operand = 3 bytes
Sub E ; acc = D -E; 1 byte opcode, 2-bytes operand = 3 bytes
Mul A ; acc = (B-C)* (D-E) ; 1 byte opcode, 2-bytes operand = 3 bytes
Store A ; A = (B-C) * (D-E) ; 1 byte opcode, 2-bytes operand = 3 bytes

Assume each opcode takes 8-bit, operands are 16-bit and data is move in 16-bit chunks:
Total bytes: 7 * 3 byte = 21 bytes
Memory traffic: 21 + 7 * 2 = 35 bytes

Using load-store:

Ld B, r1 ; r1 = B; opcode=1 B, op = 2 B; mem. traff = 2 bytes
Ld C, r2 ; r2 = C; memory traffic = 2 bytes
Sub r1, r2,r3 ; r3 = r1 * r2 = B – C; size=2 bytes, memory traffic = 0
Ld D, r1 ; r1 = D; memory traffic = 2 bytes
Ld E, r2 ; r2 = E; memory traffic = 2 bytes
Sub r1,r2,r4 ; r4 = r1 * r2 = D-E; memory traffic = 0 bytes
Mul r3, r4, r1 ; r1 = r3 * r4 = (B-C) * (D-E); memory traffic = 0 bytes
St r1,A ; A = r1 = (B-C) * (D-E); memory traffic = 2 bytes

Number of bytes for instructions: 3*7 = 21 bytes
Memory traffic: 21 + 5 *2 = 31 bytes

Using stack:

Push B ; mem. Traf = 2 bytes
Push C ; mem. Traf = 2 bytes
Sub ; B – C; mem. Traf. = 2 bytes
Push D ; mem traf = 2 bytes
Push E ; mem.traf = 2 bytes
Sub ; top of the stack contains D-E, below it is B-C
Mul ; top of the stack now contains (B-C) * (D-E)

Each instruction occupies 3 bytes (1 byte opcode and 2-byte operand), so total instructions use 3*7 = 21 bytes.
Memory traffic: 21 + 2*7 = 35 bytes

  1. Register 0, 1, 2 (R0, R1, R2) are GPRs. R3 is initialized to 1 and make sure it is unchanged.
    1. Control sequence that forms the 2’s complement difference (or, R0 ← R0 + (-R1) = R0 + 2’s complement of R1) of the contents of R0 and R1:










Write Enable
A-bus enables
B-bus enables
FF1
Time
0
1
2
3
0
1
2
3
0
1
2
3
F0
F1
T
0
1
0
0
0
1
0
0
0
0
0
0
1
1
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
1
1
0
0
0
1
1
0
0
1
0
0
0
0
0
2

Explanation:
At T=0:
F0F1 = 11 (A’), A-bus enable = 1, B-bus enable = all zeros. Data (say, Y) is placed in A-bus. C-bus contains Y’, Write-enable = pin 1. So Y is written to register 1 (R1 = Y’).
At T = 1:
F0F1 = 00 (add), A-bus enable = pin 1 (content of R1 which is X, is in A-bus), B-bus enable= pin 3 (content of R3 which is 1 is in B-bus), write-enable = pin 1 (C-bus contains Y’ + 1). So R1 = Y’ + 1 = 2’s compl. Of X.(or R1 = -Y)
At T = 2
B-bus = 0, hence data from R0, say X, is placed in B-bus. F0F1 = 00 (add), A-bus enable = 1 (which contains –Y). C-bus contains R0 + R1 = X - Y . Write-enable = pin 0, so result is stored in R0.

    1. R0 XOR R1 (R0’ & R1 + R0 & R1’) = (R0 + R1)(R0’ + R1’), leaving the result in R0.











Write Enable
A-bus enables
B-bus enables
FF1
Time
Note
0 1 2 3
0
1
2
3
0
1
2
3
F0
F1
T
(Movements etc.)
0 0 1 0
1
0
0
0
0
1
0
0
0
0
0
R2 ← A + B
1 0 0 0
1
0
0
0
0
0
0
0
1
1
1
R0 ← A’
0 1 0 0
0
1
0
0
0
0
0
0
1
1
2
R1 ← B’
0 1 0 0
1
0
0
0
0
1
0
0
0
0
3
R1 ← A’ + B’
1 0 0 0
0
1
0
0
0
0
1

0
1
4
R0 ← (A+B)&(A’+B’)

  1. Booth Multiplication Algorithm:







/***********************************************************

Title: Booth Algorithm sample

Desc: To simulate booth multiplication algorithm

Author: Buhadram

Notes:



Ref:

- Computer Architecture: A Quantitative Approach

*************************************************************/

#include

#include

#include

#include



#define NUMTYPE short int

#define NUMBITS (8*sizeof(NUMTYPE))

#define MAX(x,y) ((x) > (y) ? x : y)

#define MIN(x,y) ((x) < (y) ? x : y)

#define BITMASK(x,mask) ((x) & mask)

/*********************************

proc: bitmask

desc: return 1 if bit[pos] is set

pos=0 for LSB, and p=sizeof(x)-1

for MSB

*********************************/

#define BIT(x,pos) ( (BITMASK(x,1<<(pos))) >> (pos) )

#define ASCII_BIT(x) ((x) + '0')



char *get_input(const char *prompt, char *buf, int length)

{

char *p;

int i;

printf("%s",prompt);

// fgets is safer to avoid buffer overflow

p = fgets(buf, length, stdin);

// remove newline character

buf[strlen(buf)-1] = 0;

return p;

}



/********************************************

* Swapbits

* Desc: To swap bit orders, e.g LSB becomes MSB

Note: b[0] is LSB, b[n-1] is MSB

********************************************/

void swapbits(NUMTYPE *b)

{

int i;

int tmp;



tmp = 0;

for(i=0; i base)

base = 10; /* can only use 0-9, A-Z */

tail = &buf[BUFSIZE - 1]; /* last character position */

*tail-- = '\0';



if (10 == base && N < 0L) {

*head++ = '-';

uarg = -N;

}

else uarg = N;



if (uarg) {

for (i = 1; uarg; ++i) {

register ldiv_t r;

r = ldiv(uarg, base);

*tail-- = (char)(r.rem + ((9L < r.rem) ? ('A' - 10L) : '0'));

uarg = r.quot;

}

}

else *tail-- = '0';

memcpy(head, ++tail, i);

return str;

}



#endif





/******************************

proc: btol

desc: converts binary string to

long integer

******************************/

long btol(const char *bin)

{

int n,i;

long d;



n = strlen(bin);

d = 0;

for(i=n-1; i>=0; i--) {

d += (bin[i]-'0') * (1 << i);

}

return d;

}



/**************************************

Proc: Ashift_bits_right

desc: perform arithmetic shift right on

bits in x, but preserving the sign bit.

***************************************/

void ashift_bits_right(long *x, int numbits)

{

int neg;



neg = (*x<0) ? 1 : 0;

*x >>= numbits;



if (neg)

//*x |= (1<>= 1;

qm = q0;

q >>= 1;

if (a0)

r |= b1;

b1 <<= 1;

}

return r;

}





int main(int argc, char *argv[])

{

#define BUF_SIZE (1024+1)

char *buf;

NUMTYPE A, B;

long res;

char str[256];



buf = (char *)malloc(BUF_SIZE+1);

assert(buf);

get_input("A = ", buf, BUF_SIZE);

A = atol(buf);

printf("A (in Binary) = %s\n", ltoa(A, str, 2));

get_input("B = ", buf, BUF_SIZE);

B = atol(buf);

printf("B (in Binary) = %s\n", ltoa(B, str, 2));

res = A+B;

printf("A + B = %ld = 0x%0X\n", res, res);

res = A - B;

printf("A - B = %ld = 0x%0X\n", res, res);

puts("\n\nBOOTH MULTIPLICATION\n");

res = booth_mult(A,B);

printf("A * B = %ld = 0x%0X = %sb\n", res, res, ltoa(B,str,2));

free(buf);

return 0;

}

Friday, March 9, 2012

Counting the number of "1" bits

Another "ridiculous" interview I had was when an interviewer asked me how to count the number of set bits in a 32-bit or 8-bit. I told him that there are various algorithms to do this, but the easiest way but yet good enough in performance is a function like this:

uint32_t count_bits(uint32_t b)
{
int i;
int n = 0;

for(i=0; i<sizeof(b)*8; i++)
{
n += (b & 1);
b >>= 1;
}
return n;
}


I also told him that there are many ways to do this (referred him to look at "Beautiful Code" to see some of the beautiful algorithms).

But then the interviewer argued.  FIrst, he said the upper border of the for-loop should be to "sizeof(b)*8" (meaning, instead of using "<", I should have used "<="). To bad, I said yes to him without thinking further (Probably got nervous.  Later at home, I checked that he was wrong).  Second, he said that the fastest is to use look-up table (n[0] = 0, n[1]=1, n[2]=1,...,n[255]=8,...). Although I agreed with his argument, but then he become inconsistent with his original question and there is a big downside of his argument. I said that this wouldn't be a choice in embedded programming where memory resource is limited. Secondly, his original question was to compute ones in a 32-bit number. That means we have to 2^32 (4,294,967,296 or about 4 GBytes) items set manually in an array. That's so ridiculous. Who wants to waste that much space just to calculate the number of bits in an integer? For a byte, although it seems acceptable ("only" 256 bytes of table), but compare that to the above algorithm:


movl $32,  %edx
xorl %ecx, %ecx
.L2:
movl %ecx, %ebx
shrl %ecx
andl $1, %ebx
addl %ebx, %eax
decl %edx
jne .L2

It's only 7 x86 mnemonic instructions (about 12-18 bytes) and no memory involvement in the loop!. Yes, it is O(32) for 32-bit, but it's a constant speed.  Even further, if the function is called only once, the fully optimized version of the instructions would be inserted inline in the place (without calling the function), so we can CPU time from doing push-pop stack frames. That's another tiny speedup.  It's probably fine for table lookup (yes, it's the fastest, O(1)), but it is limited to 8-bit.  For 16 or higher, seems the space-time trade-off is too big not to consider.  Besides, the above algorithm extendable into polymorphism in Object-Oriented code.

When I searched Google, I found the following (on Stanford's web):


v = v - ((v >> 1) & 0x55555555);                    // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count


Which renders to instruction codes like below:

shrl %edx
andl $1431655765, %edx
subl %edx, %eax
movl %eax, %edx
shrl $2, %eax
andl $858993459, %edx
andl $858993459, %eax
addl %edx, %eax
movl %eax, %edx
shrl $4, %edx
leal (%edx,%eax), %eax
andl $252645135, %eax
imull $16843009, %eax, %eax
shrl $24, %eax

(14 mnemonics).  This seems the fastest and most optimized.  It took some time for me to understand it.

Seems the interviewer doesn't have enough experience in embedded or microcontroller  or doesn't think much about optimization. Nevertheless, I failed an interview again.   I start to think that many interview questions are really not to dig the thinking ability of a candidate, but rather to get instant answers meet their mind.  I just hope big software companies like IBM, Microsoft, Google or Apple shouldn't fall to this kind of trap, but rather ask logical questions to know if the candidate can think systematically.

What a waste!

Thursday, March 8, 2012

Parallelizable integer multiplication

At a recent interview with an Amazon, the hiring manager asked me about an algorithm to do integer multiplication without using any multiplication/division symbol in parallel computing.  The following algorithm was suggested to do integer multiplication without using any "MULT" instruction:


int mult(int x, int y)
{
int z;

if ((x==0) || (y==0))
return 0;
if (x==1)
return y;

if (y==1)
return x;

z = mult(x, y>>1);
if (y % 2 == 0)
return z<<1;
else
return x + (z<<1);
}

My question came up, is this parallelizable?  My argument to the interviewer was that this is not a good approach if he is asking from parallel computation perspective (See Robertazzi's papers on ACM journals to know more detail about parallel computation)

When a recursive call is made, the parameters are pushed into the stack before calling each subsequence recursive call.  Involving stack in this manner is hardly parallelizable, because there is coupling of current result based on previous results.  One of the principle of parallel computing is to decouple two or more computation as much as we can.

A dig into highly-optimized assembly language of the above code:




mult:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl %ebx, -8(%ebp)
movl %esi, -4(%ebp)
movl 12(%ebp), %ebx
movl 8(%ebp), %esi
testl %ebx, %ebx
je .L4
testl %esi, %esi
je .L4
cmpl $1, %esi
je .L2
cmpl $1, %ebx
.p2align 4,,3
je .L5
movl %ebx, %eax
movl %esi, (%esp)
sarl %eax
movl %eax, 4(%esp)
call mult
andl $1, %ebx
je .L7
leal (%esi,%eax,2), %ebx
.L2:
movl %ebx, %eax
movl -4(%ebp), %esi
movl -8(%ebp), %ebx
leave
ret
.p2align 4,,10
.p2align 3
.L4:
xorl %ebx, %ebx
movl -4(%ebp), %esi
movl %ebx, %eax
movl -8(%ebp), %ebx
leave
ret
.p2align 4,,10
.p2align 3
.L7:
leal (%eax,%eax), %ebx
movl -4(%ebp), %esi
movl %ebx, %eax
movl -8(%ebp), %ebx
leave
ret
.p2align 4,,10
.p2align 3
.L5:
movl %esi, %ebx
movl -4(%ebp), %esi
movl %ebx, %eax
movl -8(%ebp), %ebx
leave
ret


As can be seen, "SARL" (Shift Arithmetic Left) for the shift-left operation.  Some issues after analyzing this code.  First, there are multiple register-to-memory flow while doing the recursive:


movl %ebx, %eax
movl %esi, (%esp)
sarl %eax
movl %eax, 4(%esp)


Memory access is expensive and the instructions might cause cache miss thus forcing CPU to reread data from RAM.
Another issue, how do we parallelize this, if there is stack push-pop coupling for the result?


My argument was to decompose the computation into two (or more) portions.



For example:

int pivot = b/2;

for(i=0; i<pivot; i++)
sum1 += a;

for(i=pivot+1; i<b; i++)
sum2 += a;
sum = sum1 + sum2;


No, let's check what we can see in the x86 assembly (gcc-generated):

mult:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 12(%ebp), %eax
movl %ebx, (%esp)
movl %esi, 4(%esp)
movl 8(%ebp), %edx
testl %eax, %eax
je .L5
testl %edx, %edx
je .L5
cmpl $1, %edx
je .L2
cmpl $1, %eax
.p2align 4,,3
je .L6
movl %eax, %ecx
xorl %esi, %esi
shrl $31, %ecx
xorl %ebx, %ebx
addl %eax, %ecx
sarl %ecx
testl %ecx, %ecx
jle .L3
movl %ecx, %esi
movl %ecx, %ebx
imull %edx, %esi
.L3:
xorl %ecx, %ecx
cmpl %ebx, %eax
jle .L4
movl %eax, %ecx
subl %ebx, %ecx
imull %edx, %ecx
.L4:
leal (%ecx,%esi), %eax
.L2:
movl (%esp), %ebx
movl 4(%esp), %esi
leave
ret
.p2align 4,,10
.p2align 3
.L5:
xorl %eax, %eax
movl (%esp), %ebx
movl 4(%esp), %esi
leave
ret
.p2align 4,,10
.p2align 3
.L6:
movl %edx, %eax
jmp .L2

As can be seen, most of operations are in registers, also there is decoupling between 2 partitions (e.g, sum1 does not rely on sum2). The more we partition the calculation (e.g, pivot1 = b/4, and do 4 for-next loops for each partition), the more divisible partition can be accomplished by parallel computer. I was arguing that to optimize computation, hence lowering O(n), we should use divisible loops ("divide-et-empera") instead of recursive approach (seems the interviewer disagreed)

PS: My argument above caused me not to land job at Amazon.