Python script for scanning Docker logs

This is a simple docker container log scanner written in Python, I was trying to filter out real views from bots on my music discovery tools. It uses the docker python library – pip install docker should grab it.

'''
a quick python script for accessing + searching Docker log files

I used it to try and sort real visitors from fake ones
'''

import docker
client = docker.from_env()
containerlist = client.containers.list()
ignorelist = []
getlist = ['bestof', 'favicon', '/database', 'GET / ', '/update' ]
#getlist = ['GET /bestof']
for containername in containerlist:
    if containername.name not in ignorelist:
        print(containername.name)
        logs = containername.logs().decode().split('\n')
        #print(logs)
        for containerlogs in logs:
            if any(word in containerlogs for word in getlist):
                newstr = containerlogs.split('"')
                if len(newstr) is 9:
                    #print(newstr)
                    ua = newstr[5][0:200]
                    ip = newstr[7]
                    ref = newstr[3]
                    visited = newstr[1]
                    if ip == '-':
                        ip = newstr[0].split(' ')[0]
                    print(f'possible non-bot visitor: {ip} visited {visited}, related: {ref}, {ua}')

WordPress code blocks don’t work with python, but you can still copy and paste. (edit: now they do~) Here are some example results:

possible non-bot visitor: 114.119.142.97 visited GET / HTTP/1.1, related: -, Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)

possible non-bot visitor: 180.163.220.4 visited GET / HTTP/1.1, related: http://baidu.com/, Mozilla/5.0 (Linux; U; Android 8.1.0; zh-CN; EML-AL00 Build/HUAWEIEML-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.108 baidu.sogo.uc.UCBrowser/11.9.4.974 UWS/2.13.1.48 Mob

possible non-bot visitor: 125.64.94.136 visited GET /favicon.ico HTTP/1.1, related: -, Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4 240.111 Safari/537.36

a few results – of course, I mostly got bots.

I had a theory that favicon might only show up to real users, people using actual browsers – I guess that was wrong (selenium does exist, and google has it’s own favicon scraper). I’ve noticed less bots hit /database and /bestof, maybe I could filter it down from there. I notice they tend not to have a referral link (I think that’s what it is? I’ve labelled it as “related”).

I feel like I could have used this in the past, but now I can’t remember why I needed it.

Leave a comment Cancel reply