File size: 743 Bytes
c133090
 
 
 
 
 
 
 
 
388205b
 
 
 
4872a02
4bd7a25
4872a02
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
title: README
emoji: πŸ†
colorFrom: yellow
colorTo: yellow
sdk: static
pinned: false
---

BigBanyanTree is an initiative to empower engineering colleges to set up their data engineering clusters and drive interest in data processing and analysis using tools such as Apache Spark.

As part of that initiative, we have open-sourced datasets processed from CommonCrawl data.

The datasets offer two subsets having the specified columns:</br>
`script_extraction`: ["ip", "host", "server", "script_src_attrs", "year"]</br>
`ipmaxmind`: ["ip", "host", "server", "postal_code", "latitude", "longitude", "accuracy_radius", "continent_code", "continent_name", "country_iso_code", "subdivision_code", "city_name", "metro_code", "time_zone", "year"]