File: README.md | Updated: 11/18/2025
This library provides a fast, standalone way to read and write WARC Format commonly used in web archives. Requires Python 3.7+ (with only six as an external dependency).
Key Features:
This library is a spin-off of the WARC reading and writing component of pywb, a key component of Webrecorder.
pip install warcio
For optional features:
pip install warcio[all]
Iterate over WARC records and print URLs of response records:
from warcio.archiveiterator import ArchiveIterator
with open('path/to/file.warc.gz', 'rb') as stream:
for record in ArchiveIterator(stream):
if record.rec_type == 'response':
print(record.rec_headers.get_header('WARC-Target-URI'))
Capture HTTP/S traffic directly to a WARC file:
from warcio.capture_http import capture_http
import requests # Must be imported after capture_http
with capture_http('example.warc.gz'):
requests.get('https://example.com/')
warcio is licensed under the Apache 2.0 License and is part of the Webrecorder project.