Download Sequence from Accession Number using Perl


If you are looking to download batch of sequences from public database such as Genbank using the accession number, the following perl script that I’ve written may be quite handy.

This script uses a BioPerl module “Bio::DB::GenBank“. All the accession number must be present within the file accnumber.txt each separated my a comma. And also, file accnumber.txt must be present within the same directory as that of the perl-script. After successful execution it will generate a file sequence_dwnl.fa containing the sequence in fasta format.

#!usr/bin/perl -w

use strict;
use warnings;

use Bio::DB::GenBank;

open (INPUT_FILE, 'accnumber.txt');
open (OUTPUT_FILE, 'sequence_dwnl.fa');

while()
{
	chomp;
	my $line = $_;
	my @acc_no = split(",", $line);
	my $counter = 0;

	while ($acc_no[$counter])
	{
		$acc_no[$counter] =~ s/\s//g;

		if ($acc_no[$counter] =~ /^$/)
		{
			exit;
		}

		my $db_obj = Bio::DB::GenBank->new;
		my $seq_obj = $db_obj->get_Seq_by_acc($acc_no[$counter]);
		my $sequence1 = $seq_obj->seq;

		print OUTPUT_FILE ">"."$acc_no[$counter]","\n";
		print OUTPUT_FILE $sequence1,"\n";
		print "Sequence Downloaded:", "\t", $acc_no[$counter], "\n";

		$counter++;
	}
}

close OUTPUT_FILE;
close INPUT_FILE;

exit;
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s