File size: 9,635 Bytes
de6d748
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
NER_TAGS = [
    {"name": "organization", "label": "Organization", "color": "#F1CBCB"},
    {"name": "metric", "label": "Metric", "color": "#CAEACA"}
]

NER_DATA = [
    [
        {"text": "At "},
        {"text": "Santander", "tag": "organization"},
        {"text": " our mission is to help people and businesses prosper. "},
        {"text": "We are always looking for ways to help our customers understand their financial health "},
        {"text": "and identify which products and services might help them achieve their monetary goals. "},
        {"text": "Our data science team is continually challenging our machine learning algorithms, working with "},
        {"text": "the global data science community to make sure we can more accurately identify new ways "},
        {"text": "to solve our most common challenge, binary classification problems such as: "},
        {"text": "is a customer satisfied? Will a customer buy this product? Can a customer pay this loan? "},
        {"text": "In this challenge, we invite Kagglers to help us identify which customers will make "},
        {"text": "a specific transaction in the future, irrespective of the amount of money transacted. "},
        {"text": "The data provided for this competition has the same structure as the real data we have available "},
        {"text": "to solve this problem."}
    ],
    [
        {"text": "Many people struggle to get loans due to insufficient or non-existent credit histories. "},
        {"text": "And, unfortunately, this population is often taken advantage of by untrustworthy lenders. "},
        {"text": "Home Credit", "tag": "organization"},
        {"text": " strives to broaden financial inclusion for the unbanked population by providing "},
        {"text": "a positive and safe borrowing experience. "},
        {"text": "In order to make sure this underserved population has a positive loan experience, "},
        {"text": "Home Credit", "tag": "organization"},
        {"text": " makes use of a variety of alternative data--including telco & transactional information"},
        {"text": "--to predict their clients repayment abilities. While "},
        {"text": "Home Credit", "tag": "organization"},
        {"text": " is currently using various statistical and machine learning methods to make "},
        {"text": "predictions, they're challenging Kagglers to help them unlock "},
        {"text": "the full potential of their data. "},
        {"text": "Doing so will ensure that clients capable of repayment are not rejected "},
        {"text": "and that loans are given with a principal, maturity, and repayment calendar that will empower "},
        {"text": "their clients to be successful."}
    ],
    [
        {"text": "Imagine standing at the check-out counter at the grocery store with a long line behind you "},
        {"text": "and the cashier not-so-quietly announces that your card has been declined. "},
        {"text": "In this moment, you probably aren’t thinking about the data science that determined your fate. "},
        {"text": "Embarrassed, and certain you have the funds to cover everything needed for an epic "},
        {"text": "nacho party for 50 of your closest friends, you try your card again. "},
        {"text": "Same result. As you step aside and allow the cashier to tend to the next customer, "},
        {"text": "you receive a text message from your bank. "},
        {"text": "'Press 1 if you really tried to spend $500 on cheddar cheese.' "},
        {"text": "While perhaps cumbersome (and often embarrassing) in the moment, "},
        {"text": "this fraud prevention system is actually saving consumers millions of dollars per year. "},
        {"text": "Researchers from the "},
        {"text": "IEEE Computational Intelligence Society (IEEE-CIS)", "tag": "organization"},
        {"text": " want to improve this figure, while also improving the customer experience. With higher "},
        {"text": "accuracy", "tag": "metric"},
        {"text": " fraud detection, you can get on with your chips without the hassle. "},
        {"text": "IEEE-CIS", "tag": "organization"},
        {"text": " works across a variety of AI and machine learning areas, including deep neural networks, "},
        {"text": "fuzzy systems, evolutionary computation, and swarm intelligence. "},
        {"text": "Today they’re partnering with the world’s leading payment service company, "},
        {"text": "Vesta Corporation", "tag": "organization"},
        {"text": ", seeking the best solutions for fraud prevention industry, "},
        {"text": "and now you are invited to join the challenge. "},
        {"text": "In this competition, you’ll benchmark machine learning models on a challenging large-scale dataset. "},
        {"text": "The data comes from "},
        {"text": "Vesta", "tag": "organization"},
        {"text": "'s real-world e-commerce transactions "},
        {"text": "and contains a wide range of features from device type to product features. "},
        {"text": "You also have the opportunity to create new features to improve your results. "},
        {"text": "If successful, you’ll improve the efficacy of fraudulent transaction alerts for millions of people "},
        {"text": "around the world, helping hundreds of thousands of businesses reduce their "},
        {"text": "fraud loss", "tag": "metric"},
        {"text": " and increase their "},
        {"text": "revenue", "tag": "metric"},
        {"text": ". And of course, you will save party people just like you the hassle of "},
        {"text": "false positives", "tag": "metric"},
        {"text": "."}
    ],
    [
        {"text": "How much camping gear will one store sell each month in a year? "},
        {"text": "To the uninitiated, calculating sales at this level may seem as difficult as predicting the weather. "},
        {"text": "Both types of forecasting rely on science and historical data. "},
        {"text": "While a wrong weather forecast may result in you carrying around an umbrella on a sunny day, "},
        {"text": "inaccurate business forecasts could result in actual or opportunity losses. "},
        {"text": "In this competition, in addition to traditional forecasting methods you’re also challenged to use "},
        {"text": "machine learning to improve forecast "},
        {"text": "accuracy", "tag": "metric"},
        {"text": ". The Makridakis Open Forecasting Center (MOFC) at the "},
        {"text": "University of Nicosia", "tag": "organization"},
        {"text": " conducts cutting-edge forecasting research and provides business forecast training. "},
        {"text": "It helps companies achieve accurate predictions, estimate the levels of uncertainty, "},
        {"text": "avoiding costly mistakes, and apply best forecasting practices. "},
        {"text": "The MOFC is well known for its Makridakis Competitions, the first of which ran in the 1980s. "},
        {"text": "In this competition, the fifth iteration, you will use hierarchical sales data from Walmart, "},
        {"text": "the world’s largest company by "},
        {"text": "revenue", "tag": "metric"},
        {"text": ", to forecast daily sales for the next 28 days. "},
        {"text": "The data, covers stores in three US States (California, Texas, and Wisconsin) "},
        {"text": "and includes item level, department, product categories, and store details. "},
        {"text": "In addition, it has explanatory variables such as "},
        {"text": "price, promotions, day of the week, and special events. "},
        {"text": "Together, this robust dataset can be used to improve forecasting "},
        {"text": "accuracy", "tag": "metric"},
        {"text": ". If successful, your work will continue to advance the theory and practice of forecasting. "},
        {"text": "The methods used can be applied in various business areas, such as setting up appropriate "},
        {"text": "inventory or service levels. Through its business support and training, "},
        {"text": "the MOFC will help distribute the tools and knowledge so others can achieve more accurate "},
        {"text": "and better calibrated forecasts, reduce waste and be able to appreciate uncertainty and its risk "},
        {"text": "implications."}
    ],
    [
        {"text": "Nothing ruins the thrill of buying a brand new car more quickly than seeing your new insurance bill. "},
        {"text": "The sting’s even more painful when you know you’re a good driver. "},
        {"text": "It doesn’t seem fair that you have to pay so much if you’ve been cautious on the road for years. "},
        {"text": "Porto Seguro, one of Brazil’s largest auto and homeowner insurance companies, completely agrees. "},
        {"text": "Inaccuracies in car insurance company’s claim predictions raise the cost of insurance for "},
        {"text": "good drivers and reduce the price for bad ones. "},
        {"text": "In this competition, you’re challenged to build a model that predicts the probability that "},
        {"text": "a driver will initiate an auto insurance claim in the next year. While "},
        {"text": "Porto Seguro", "tag": "organization"},
        {"text": " has used machine learning for the past 20 years, "},
        {"text": "they’re looking to Kaggle’s machine learning community to explore new, more powerful methods. "},
        {"text": "A more accurate prediction will allow them to further tailor their prices, and hopefully "},
        {"text": "make auto insurance coverage more accessible to more drivers."}
    ]
]