Category: Projects


Open source DropBox in Amazon EC2

I hope that most of you would be aware of DropBox, just in case you don’t have an idea about what dropbox does the following post would help you understand dropbox and its functionalities.

DropBox is basically a online backup system in which the users have an account associated with them. A folder named dropbox is created in the user’s documents folder(In windows 7). The files and folders that are copied to this folder is synced to a back end cloud storage system. (We do not care on how dropbox manages its user’s and stuff :D). First of all the major advantage of dropbox is that it syncs data on to the cloud when the user’s bandwidth is idle or under minimal usage. This post will tell you on how you can exploit Amazon EC2 cloud to have your own storage system that can be used to back up data. Using this method you would be able to access your data from any place (provided you have an internet connection :D). That is your data becomes portable. Now lets get to the task.

First of all i recommend using either CentOS or Ubuntu (10.10). It’s totally up to you regarding which distribution you use for this task.  First of all I use root to access my files and I don’t follow certain ethics as no one else is gonna use my laptop.

And before I forget credits to fak3r for coming up with such a great idea. I had tried this one and it was like very useful and that’s why I thought of writing it on my own blog with some extensions to it.

First install packages rsync(must be included with the linux box), openssh-server and lsync.

Lsync is expanded as live syncing daemon. I have no expertise on this particular package AFAIK its a package that monitors the XML structure of the file system and it has the ability to run a particular command or another system call. The lsyncd makes use of the inotify feature of the linux kernel that is used to watch folders and report events.

Rsync is a package that is used to transfer files between folders or systems (local or remote). The algorithm is designed to check for changes that have occurred to the file and transfers the changed parts of the files and folders. Rsync can be used to sync data both locally and in remote systems and hence widely used in backing up systems and servers on a large scale.

Assuming you use Ubuntu

apt-get install lsync openssh-server

I assume lsync is available with apt-get else you need to download the source code from code.google.com and compile it.

wget http://lsyncd.googlecode.com/files/lsyncd-1.26.tar.gz

tar -zxf lsyncd-1.26.tar.gz

cd lsyncd-1.26

./configure

make

make install

You must be having an example lsyncd.conf.xml in the folder from which you extracted. Just copy it to /etc/lsyncd.conf.xml

Next you need to make alternations to the configuration file in your system. Open the configuration file lsyncd.conf.xml. Go the portion that tells you about the source and target. Now you job is to set the source to the folder you wish to backup on the Amazon Cloud. Say /home/ananth/syncdata . So your configuration file must have this line. (I mean modify).

<source path=”/home/ananth/syncdata” />

Next is the target path. Here you need to provide the IP address of the system with which you want to sync to. Here give the IP Address of the Amazon EC2 instance. I’ll talk about Amazon EC2 instances and IP Addresses in my next post and I’ll make sure I’ll the post the link here.

So your target should essentially look like this

        <target path=”10.15.16.17:/home/ec2-user/syncbackup”/ >

Now in your terminal run

lsyncd –config /etc/lsyncd.conf.xml –debug.

If you get any errors then there is some problem with the setup. Since your are syncing data to a machine on the internet it should be fine. Make sure your read the /var/log/messages for any clue if the setup goes wrong. You must be able to figure out the errors.

Everything that I have done is in your system, the remote host configuration is up next.

Now hoping that you have an Amazon account I proceed with this post. Set up an Amazon EC2 instance. Attach it to an elastic IP address. Just make sure that your security considerations are fine. Make sure that port 22 is open and am sure it must be.

I’ll make sure that I update.





De-duplication articles

Was going through the de-duplication techniques , what companies do what are the pros and cons of de-duplication. This is one article that says it all. Wonderful post.

Courtesy : http://nsrd.info/blog/2011/08/07/7-common-problems-with-deduplication/

De-Duplication

Since I was into de-duplication as a part of my undergraduate thesis , I found the articles from NetApp and EMC very interesting.

De-duplication is the process of removing replicas of a file, the file may be of any type (example .jpg,.txt,.doc etc). There are two major approaches to data de-duplication, one is file level and the other is database level.

Few companies employ file level de-duplication while a majority of them employ block level de-duplication.

Block Level De-duplication :

De-duplication at the data block level compares blocks of data  with other subsequent blocks. Block level deduplication allows you to de-duplicate data within a given object. If an object (file, database, etc.) contains blocks of data that are identical to each other, then block level deduplication eliminates storing the redundant data and reduces the size of the object in storage.

In De-duplication a single copy of the file is maintained and other copies of the file are made into references to that particular file, which drastically reduces the file size in case the redundancies for that particular file is very high. References may be similar to soft link approach used in Linux.

For example if a image file that is 3MB , is stored in 5 different locations the total size occupied by that file is 15MB. In case of data de-duplication a single copy of the file is maintained and the rest of the copies are made as references to the original file location (or) rather single file location. So the result after de-duplication comes way lesser than the former may be just slightly higher than 3MB.

Files may be named differently, this poses a great challenge hence the md5/SHA-1  of the file is calculated and checked for duplicates. Links are established between similar files. For my project I use the Amazon S3 for storing data on the cloud .I found it to be an easy and efficient way of storing and accessing my data. Amazon AWS provides support for various languages like C#, Java and PHP etc. The howto’s are provided under the Developer section of the Amazon AWS website.

The links given below provide some useful resources regarding de-duplication.

http://www.informationweek.com/blog/229205878

http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/134-inline-or-post-process.html/

http://www.evaluatorgroup.com/document/data-de-duplication-%E2%80%93why-when-where-and-how-infostor-article-by-russ-fellows/

And of course Wiki

http://en.wikipedia.org/wiki/Data_deduplication

Since a lot of research is being carried out on how to decrease the storage costs : de-duplication proves to be an effective tool in this regard.

Minix resources

recomp

How_to_add_system_call

016

These were some of the resources that i used while learning the MINIX Operating System. A book is exclusively written by Andrew Tannenbaum that deals with in depth concepts of the MINIX system calls and its file system.

SUMO ROBOT ARENA

This robot was designed by myself and my project mate Manikandan Eshwar as a part of our Embedded Systems Laboratory. The basic functionality of the robot is as follows :

It is similar to a obstacle detection robot but it senses an object and pushes it out of the ring.

The program is coded in such a way that it detects white and black color, once within the white ring it moves about freely searching for an opponent, once it senses the black it begins to rotate about its axis as crossing the white means its out of the ring and its declared out. It was in this basis that the robot was designed, i ll upload a copy of the ring and the design soon.

HTH

Code and circuit diagram available on request

Simulation of DUP system call in C

#include<stdio.h>
#include<conio.h>
#include<string.h>
struct inode
{
int inodeno;
int rc;
}*in[10];
struct fdnode
{
int fd;
struct inode *next;
}*fd[10],*nfd;
struct file
{
char name[10];
int inodeno;
}*fp[10],*root;
struct tree
{
int inodeno;
char fname[10];
struct tree *left,*right;
};
int winode;
int main()
{
char path[30];
int inode,i,k,j;
char name[][10]={“root”,”etc”,”usr”,”bin”};
char temp[5][10];
struct tree *node[4],*newnode;
for(i=0;i<4;i++)
{
node[i]=(struct tree*)malloc(sizeof(struct tree));
strcpy(node[i]->fname,name[i]);
node[i]->inodeno=rand()%1000;
node[i]->left=NULL;
node[i]->left=NULL;
}
node[0]->left=node[2];
node[0]->right=node[1];
node[1]->left=node[3];
start:
printf(“Enter filename:”);
scanf(“%s”,path);
k=1;
for(j=0;j<4;j++)
{
for(i=0;i<10;i++)
{
if(path[k]==’/’ || path[k]==”)
{
k++;
break;
}
temp[j][i]=path[k];
k++;
}
if(k==strlen(path)+1)
{
temp[j][i]=”;
break;
}
temp[j][i]=”;
continue;
}
root=(struct file*) malloc (sizeof(struct file));
strcpy(root->name,”root”);
root->inodeno=rand()%1000;
winode=root->inodeno;
newnode=(struct tree*)malloc(sizeof(struct tree));
strcpy(newnode->fname,temp[j]);
//newnode->inodeno=rand()%1000;
newnode->left=NULL;
newnode->right=NULL;
for(i=0;i<=j;i++)
{
printf(“%s\n”,temp[i]);
}
if(strcmp(temp[0],node[0]->fname)==0)
{
if(strcmp(temp[1],node[1]->fname)==0)
{
if(j==2)
{
node[1]->right=newnode;
inode=namei(temp[2]);
}
else
{
if(strcmp(temp[2],node[3]->fname)==0)
{
if(j==3)
{
node[3]->right=newnode;
inode=namei(temp[3]);
}
else
{
printf(“Wrong path\n”);
goto start;
}
}
else
{
printf(“Wrong path\n”);
goto start;
}
}
}
else
{
if(strcmp(temp[1],node[2]->fname)==0)
{
if(j==2)
{
node[2]->right=newnode;
inode=namei(temp[2]);
}
else
{
printf(“Wrong path\n”);
goto start;
}
}
else
{
printf(“Wrong path\n”);
goto start;
}
}
}
else
{
printf(“Wrong path\n”);
goto start;
}
if(inode==0)
goto end;
for(i=0;i<10;i++)
{
if(in[i]->inodeno==inode)
{
in[i]->rc++;
nfd=(struct fdnode*) malloc (sizeof(struct fdnode));
nfd->fd=rand()%1000;
nfd->next=in[i];
break;
}
}
printf(“UFD AND INODE TABLE(after dup)\n”);
printf(“File Desc\tInode No\tRC\n”);
for(i=0;i<10;i++)
{
printf(“%d\t\t%d\t\t%d\n”,fd[i]->fd,in[i]->inodeno,in[i]->rc);
}
printf(“%d\t\t%d\t\t%d\n”,nfd->fd,nfd->next->inodeno,nfd->next->rc);
printf(“——————————————————————————–\n”);
end:
getch();
}
int namei(char name[10])
{
char fname[10][10]={“demo”,”all”,”info”,”data”,”sam”,”exam”,”eg”,”test”,”dev”,”etc”};
int i,flag=0;
for(i=0;i<10;i++)
{
fp[i]=(struct file*) malloc (sizeof(struct file));
strcpy(fp[i]->name,fname[i]);
fp[i]->inodeno=rand()%1000;
}
printf(“FILE SYSTEM HIERARCHY\n”);
printf(“Filename\tInode No\n”);
for(i=0;i<10;i++)
{
printf(“%s\t\t%d\n”,fp[i]->name,fp[i]->inodeno);
}
printf(“——————————————————————————–\n”);
for(i=0;i<10;i++)
{
in[i]=(struct inode*) malloc (sizeof(struct inode));
in[i]->inodeno=fp[i]->inodeno;
in[i]->rc=1;
fd[i]=(struct fdnode*) malloc (sizeof(struct fdnode));
fd[i]->fd=rand()%1000;
}
printf(“FILE DATASTRUCTUES..\n”);
printf(“UFD AND INODE TABLE\n”);
printf(“File Desc\tInode No\tRC\n”);
for(i=0;i<10;i++)
{
printf(“%d\t\t%d\t\t%d\n”,fd[i]->fd,in[i]->inodeno,in[i]->rc);
}
printf(“——————————————————————————–\n”);
for(i=0;i<10;i++)
{
if(strcmp(fp[i]->name,name)==0)
{
winode=fp[i]->inodeno;
break;
}
}
flag=iget(winode);
if(flag==1)
return winode;
printf(“Error… File not available..”);
return 0;
}
int iget(int inode)
{
int inodecache[10],i;
for(i=0;i<10;i++)
{
inodecache[i]=fp[i]->inodeno;
}
for(i=0;i<10;i++)
{
if(winode==inodecache[i])
return 1;
}
return 0;
}

 Installation
RTMINIX3 can be installed in several ways. It is possible to install it from
an installation CD. Furthermore it is possible to rebuild a standard Minix
3.1.2a installation to RTMINIX3 by patching or copying the source code. We
also provide a pre-installed VMware virtual machine image.
Using the installation CD
Installation of RTMINIX3 is exactly the same as installing Minix 3. You
can use the installation manual of Minix 3 to guide you through the instal-
lation.
Patching and rebuilding a Minix 3.1.2a installation
1. Install Minix 3.1.2a (if not already done).
2. Install GNU Patch. GNU Patch is not available in Packman. Either
build it from source or install the binary by transferring it to your
system (e.g downloading using urlget or FTP).
3. Transfer the patch le to your computer (e.g. using FTP).
4. Create a backup of the source directory:
cp -r /usr/src/ /usr/src_backup/
5. Patch the source directory:
/usr/gnu/bin/patch -p0 < /path/to/rtminix3.patch
6. Rebuild the system:
cd /usr/src/tools && make fresh install
7. Reboot:
reboot
8. Rebuild commands:
cd /usr/src/commands && make all install

Copying the source code and rebuilding a Minix 3.1.2a installation
1. Install Minix 3.1.2a (if not already done).
2. Create a backup of the source directory:
mv /usr/src/ /usr/src_backup/
3. Transfer the source code package (rtminix3.tar.bz2) to your computer
(e.g. using FTP).
4. unzip the package to rtminix3.tar:
bunzip2 rtminix3.tar.bz2
5. Extract the tarbal le to /usr/src/:
tar xvf rtminix3.tar
6. Rebuild the system:
cd /usr/src/tools && make fresh install
7. Reboot:
reboot
8. Rebuild commands:
cd /usr/src/commands && make all install
Using the VMware virtual machine image
1. Unzip the le containing the virtual machine image (RTMINIX3 vmware.zip).
2. Follow the instructions of your VMware software to add an existing virtual machine.3. If asked for the vmx con g le, provide the path to RTMINIX3.vmx

Kickstart Steps:
1. Install vsftpd package
2. Copy the entire contents of the dvd to /var/ftp/pub
3. Install the package create repo
createrepo -g comps*.xml /var/ftp/pub
4. Setup yum repo
5. Install system-netboot-tools package
6. Set up nfs exports to export the file “/var/ftp/pub *(ro,sync)”
7. Configure tftp server by setting disable=no instead of disable=yes and comment the line in /etc/xinetd/tftp
8. Setup dhcp server use config file dhcpd.conf done by me in /home/kickstart folder
9. Use system-config-kickstart command to setup anaconda-ks.cfg file and copy it to /var/ftp/pub
10.Install kernel el5 and setup headers and dependency packages “kernel-2.6*.el5”
11. Use network boot in the client

sample ks.cfg file

#platform=x86, AMD64, or Intel EM64T
# System authorization information
auth –useshadow –enablemd5
# System bootloader configuration
bootloader –location=mbr
# Partition clearing information
clearpart –none
# Use text mode install
text
# Firewall configuration
firewall –enabled
# Run the Setup Agent on first boot
firstboot –disable
# System keyboard
keyboard us
# System language
lang en_US
# Installation logging level
logging –level=info
# Use NFS installation media
nfs –server=192.168.1.1 –dir=/var/ftp/pub/
# Network information
network –bootproto=dhcp –device=eth0 –onboot=on
# Reboot after installation
reboot
# SELinux configuration
selinux –disabled
# System timezone
timezone Asia/Calcutta
# Install OS instead of upgrade
install
# X Window System configuration information
xconfig –defaultdesktop=GNOME –depth=8 –resolution=800×600
# Disk partitioning information
part / –bytes-per-inode=4096 –fstype=”ext3″ –size=5000
part /boot –bytes-per-inode=4096 –fstype=”ext3″ –size=1000
part /home –bytes-per-inode=4096 –fstype=”ext3″ –size=10000
part swap –bytes-per-inode=4096 –fstype=”swap” –size=2000
%packages
@office
@development-libs
@editors
@gnome-software-development
@text-internet
@x-software-development
@virtualization
@gnome-desktop
@dialup
@core
@base
@games
@java
@legacy-software-support
@base-x
@graphics
@printing
@sound-and-video
@admin-tools
@development-tools
@graphical-internet
emacs
mesa-libGLU-devel
kexec-tools
bridge-utils
device-mapper-multipath
xorg-x11-utils
xorg-x11-server-Xnest
xorg-x11-server-Xvfb
libsane-hpaio
imake
-sysreport

dhcpd.conf

#
# DHCP Server Configuration file.
# see /usr/share/doc/dhcp*/dhcpd.conf.sample
#
ddns-update-style interim;
ignore client-updates;

subnet 192.168.1.0 netmask 255.255.255.0 {

# — default gateway
option routers 192.168.1.1;
option subnet-mask 255.255.255.0;

# option nis-domain “domain.org”;
# option domain-name “domain.org”;
# option domain-name-servers 192.168.1.1;

option time-offset -18000; # Eastern Standard Time
# option ntp-servers 192.168.1.1;
# option netbios-name-servers 192.168.1.1;
# — Selects point-to-point node (default is hybrid). Don’t change this unless
# — you understand Netbios very well
# option netbios-node-type 2;

range dynamic-bootp 192.168.1.10 192.168.1.20;
filename “linux-install/pxelinux.0”;
next-server 192.168.1.1;
default-lease-time 21600;
max-lease-time 43200;

# we want the nameserver to appear at a fixed address
host ns {
next-server marvin.redhat.com;
hardware ethernet 12:34:56:78:AB:CD;
fixed-address 207.175.42.254;
}
}

tftp configuration

# default: off
# description: The tftp server serves files using the trivial file transfer \
# protocol. The tftp protocol is often used to boot diskless \
# workstations, download configuration files to network-aware printers, \
# and to start the installation process for some operating systems.
service tftp
{
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = -s /tftpboot
# disable = no
per_source = 11
cps = 100 2
flags = IPv4
}

Setup NFS if needed

Social Networking Website

Well this was the project we worked on during our 4th semester project.

Languages used : PHP ,AJAX.

Design Components : Headers and menus using Adobe Photoshop and Adobe Flash

Screen Shots coming soon 😛