This is a simple docker container log scanner written in Python, I was trying to filter out real views from bots on my music discovery tools. It uses the docker python library – pip install docker should grab it.
'''
a quick python script for accessing + searching Docker log files
I used it to try and sort real visitors from fake ones
'''
import docker
client = docker.from_env()
containerlist = client.containers.list()
ignorelist = []
getlist = ['bestof', 'favicon', '/database', 'GET / ', '/update' ]
#getlist = ['GET /bestof']
for containername in containerlist:
if containername.name not in ignorelist:
print(containername.name)
logs = containername.logs().decode().split('\n')
#print(logs)
for containerlogs in logs:
if any(word in containerlogs for word in getlist):
newstr = containerlogs.split('"')
if len(newstr) is 9:
#print(newstr)
ua = newstr[5][0:200]
ip = newstr[7]
ref = newstr[3]
visited = newstr[1]
if ip == '-':
ip = newstr[0].split(' ')[0]
print(f'possible non-bot visitor: {ip} visited {visited}, related: {ref}, {ua}')
WordPress code blocks don’t work with python, but you can still copy and paste. (edit: now they do~) Here are some example results:
possible non-bot visitor: 114.119.142.97 visited GET / HTTP/1.1, related: -, Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot) |
possible non-bot visitor: 180.163.220.4 visited GET / HTTP/1.1, related: http://baidu.com/, Mozilla/5.0 (Linux; U; Android 8.1.0; zh-CN; EML-AL00 Build/HUAWEIEML-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.108 baidu.sogo.uc.UCBrowser/11.9.4.974 UWS/2.13.1.48 Mob |
possible non-bot visitor: 125.64.94.136 visited GET /favicon.ico HTTP/1.1, related: -, Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4 240.111 Safari/537.36 |
I had a theory that favicon might only show up to real users, people using actual browsers – I guess that was wrong (selenium does exist, and google has it’s own favicon scraper). I’ve noticed less bots hit /database and /bestof, maybe I could filter it down from there. I notice they tend not to have a referral link (I think that’s what it is? I’ve labelled it as “related”).
I feel like I could have used this in the past, but now I can’t remember why I needed it.