Python - Sqlite

From PeformIQ Upgrade
Revision as of 12:54, 25 January 2008 by PeterHarding (talk | contribs) (New page: =References= * http://www.hwaci.com/sw/sqlite/ * http://en.wikipedia.org/wiki/SQLite * http://www.sqlite.org/cvstrac/wiki * http://snippets.dzone.com/posts/show/653 * http://del.icio.us/p...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

References

Examples

GeoLite

Tables

CREATE TABLE locations(
                locid    INTEGER PRIMARY KEY,
                country TEXT,
                region    TEXT,
                city    TEXT,
                postalCode TEXT,
                latitude REAL,
                longitude REAL,
                dmaCode INTEGER,
                areaCode INTEGER)

CREATE TABLE blocks(
                startIpNum INTEGER,
                endIpNum INTEGER,
                locId INTEGER)

Data

Blocks table has 2,776,436 rows
Locations table has 159,488 rows

Instructions

GeoIP Country CSV Text Files

MaxMind GeoIP databases are available in a Comma Separated Value (CSV) format, in addition to the binary format. These CSV files generally contain IP Address range and geographical data for all publicly assigned IPv4 addresses.

Due to the large size of geolocation databases, we generally recommend using our binary format with one of our APIs, since they are highly optimized for speed and disk space. On the other hand, if you have a requirement to import the data into a SQL database, the CSV format is recommended. We have listed some guidelines for importing and querying the data with a SQL database.

CSV Format

The CSV File contains six fields:

    * Beginning IP Address
    * Ending IP Address
    * Beginning IP Number*
    * Ending IP Number*
    * ISO 3166 Country Code

This is an sample of how the CSV file is structured:

"begin_ip","end_ip","begin_num","end_num","country","name"
"61.88.0.0","61.91.255.255","1029177344","1029439487","AU","Australia"
"61.92.0.0","61.93.255.255","1029439488","1029570559","HK","Hong Kong"
"61.94.0.0","61.94.7.255","1029570560","1029572607","ID","Indonesia"
  • Beginning IP Number and Ending IP Number are calculated as follows:
ipnum = 16777216*w + 65536*x + 256*y + z   (1)

where

IP Address = w.x.y.z

The reverse of this formula is

w = int ( ipnum / 16777216 ) % 256;
x = int ( ipnum / 65536    ) % 256;
y = int ( ipnum / 256      ) % 256;
z = int ( ipnum            ) % 256;

Where % is the mod operator.

Here is sample Perl code to convert the IP number to a IP address:

sub numToStr {
  my ($ipnum) = @_;
  my $z = $ipnum % 256;
  $ipnum >>= 8;
  my $y = $ipnum % 256;
  $ipnum >>= 8;
  my $x = $ipnum % 256;
  $ipnum >>= 8;
  my $w = $ipnum % 256;
  return "$w.$x.$y.$z";
}

It is useful to have the IP Number if you are performing IP Address lookups using a database. For example the following queries will find the country based on IP Address 24.24.24.24:

SQL Query

SELECT ip_country FROM geoip WHERE 404232216 BETWEEN begin_ip_num AND end_ip_num

MySQL Query

SELECT ip_country FROM geoip WHERE 404232216 >= begin_ip_num AND
	404232216 <= end_ip_num

Here we used the formula (1) to compute the IP Number based on 24.24.24.24

404232216 = 16777216*24 + 65536*24 + 256*24 + 24

Guides For more information on importing GeoIP CSV files into MySQL, see HOW-TO Import the MaxMind GeoIP Free Country CSV file into MySQL and save diskspace.

For more information on importing GeoIP CSV files into Oracle 8i\+ with PL/SQL and SQL*Loader files included, see GeoIP01.zip on Sascha Pfalz's download page.

For more information on importing GeoIP CSV files into MS Access, see How to install the MaxMind GeoIP CSV databases into an MS Access Database.

Script

'''geolite
GeoLite City is a free IP to city database provided by MaxMind.
They provide a C API (and a python wrapper) for the database.
If you can't compile the C sources on your server (or get a binary
version), this script might be helpful for you.
The script puts the geoip data in a sqllite database, and provides
interfaces for updating and searching the database.

To use this script, get the database in CSV format:
http://www.maxmind.com/app/geolitecity

You also need to have python 2.5 for this script (sqlite3 is used)
'''

import sqlite3 as sqlite
import os

def dottedQuadToNum(ip):
    "convert decimal dotted quad string to long integer"

    hexn = ''.join(["%02X" % long(i) for i in ip.split('.')])
    return long(hexn, 16)


def cursorToDict(cursor):
    val = cursor.next()
    return dict([(cursor.description[i][0],val[i]) for i in xrange(len(cursor.description))])

def test():
    import sqlite3
    from time import clock
    x = sqlite3.connect('geolite.db')
    y = x.cursor()
    ip = dottedQuadToNum("84.108.189.94")
    res = y.execute('select * from blocks,locations where locations.locid = blocks.locid AND ? >= blocks.startIpNum AND ? <= blocks.endIpNum', [ip,ip])
    begin = clock()
    f = res.next()
    end = clock()
    y.close()
    x.close()
    return end-begin, f

def test2():
    from time import clock
    x = GeoLiteDB()
    x.connect();
    begin = clock()
    x.ipLocation("84.108.189.94");
    end = clock()
    x.close()
    return end - begin


def createDB(dbPath = 'geolite.db', locationsPath='GeoLiteCity-Location.csv', blocksPath='GeoLiteCity-Blocks.csv', warnOnDelete = True):
    if os.path.exists(dbPath):
        if warnOnDelete:
	    print "file %s will be deleted. Press any key to continue, or 'n' to abort..." % (os.path.abspath(dbPath))
	    if getch() == 'n':
	        print 'aborted.'
	        return None
	os.remove(os.path.abspath(dbPath))
    conn = sqlite.connect(dbPath)
    cursor = conn.cursor()
    try:
        cursor.execute('''CREATE TABLE locations(
				locid	INTEGER PRIMARY KEY,
				country TEXT,
				region	TEXT,
				city	TEXT,
				postalCode TEXT,
				latitude REAL,
				longitude REAL,
				dmaCode INTEGER,
				areaCode INTEGER)''')

	cursor.execute('''CREATE TABLE blocks(
				startIpNum INTEGER,
				endIpNum INTEGER,
				locId INTEGER)''')

	locations = file(locationsPath,'r')
	print ('parsing locations. This will a while.')
	print locations.readline().strip() #should print copyright note
        print locations.readline().strip() #should print column names
        lines = ([x.strip('"') for x in line.strip().split(',')] for line in locations.xreadlines())
        cursor.executemany('insert into locations values (?,?,?,?,?,?,?,?,?)', lines)
	locations.close()

	blocks = file(blocksPath,'r')
	print ('parsing blocks. This will take longer.')
	print blocks.readline().strip() #should print copyright note
        print blocks.readline().strip() #should print column names
        lines = ([x.strip('"') for x in line.strip().split(',')] for line in blocks.xreadlines())
	cursor.executemany('insert into blocks values (?,?,?)', lines)
	blocks.close()

#        cursor.execute('''CREATE UNIQUE INDEX startIpNumIx ON blocks(startIpNum);''')
#	cursor.execute('''CREATE UNIQUE INDEX endIpNumIx ON blocks(endIpNum);''')

        conn.commit()

	print 'analyze'
	cursor.execute('''ANALYZE;''')

        numBlocks = cursor.execute('select count(*) from blocks').fetchone()[0]
	numLocations = cursor.execute('select count(*) from locations').fetchone()[0]

	return numBlocks, numLocations

    finally:
	cursor.close()
        conn.close()


class GeoLiteDB:
    def __init__(self, dbPath = 'geolite.db'):
        self.dbPath = dbPath
	self._conn = None
	self._cursor = None

    def connect(self):
        if self._conn:
            raise 'database already opened'
        self._conn = sqlite.connect(self.dbPath)
	self._cursor = self._conn.cursor()
    def close(self):
         if not self._conn:
	      raise 'databse was not opened'
	 self._cursor.close()
         self._conn.close()
    def autoConnect(self):
        if not self._conn:
	    self.connect()
    def countBlocks(self):
        self.autoConnect()
        return self._cursor.execute('select count(*) from blocks').next()[0]
    def countLocations(self):
        self.autoConnect()
        return self._cursor.execute('select count(*) from locations').next()[0]
    def ipLocation(self, ip):
        self.autoConnect()
        if isinstance(ip,str):
            ip = dottedQuadToNum(ip)
        return cursorToDict(self._cursor.execute('select * from blocks,locations where locations.locid = blocks.locid AND ? >= blocks.startIpNum AND ? <= blocks.endIpNum', [ip,ip]))

#cross platform getch, from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/134892
class _Getch:
    """Gets a single character from standard input.  Does not echo to the
screen."""
    def __init__(self):
        try:
            self.impl = _GetchWindows()
        except ImportError:
            self.impl = _GetchUnix()

    def __call__(self): return self.impl()


class _GetchUnix:
    def __init__(self):
        import tty, sys

    def __call__(self):
        import sys, tty, termios
        fd = sys.stdin.fileno()
        old_settings = termios.tcgetattr(fd)
        try:
            tty.setraw(sys.stdin.fileno())
            ch = sys.stdin.read(1)
        finally:
            termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
        return ch


class _GetchWindows:
    def __init__(self):
        import msvcrt

    def __call__(self):
        import msvcrt
        return msvcrt.getch()


getch = _Getch()