What's in Your Data: Data Profiler - Austin Walters, Jeremy Goodsitt | PyData Global 2021

Views: 1
0
0
What's in Your Data: Data Profiler - an Open Source Solution to Explain Your Data Speaker: Austin Walters, Jeremy Goodsitt Summary Data understanding is crucial for most machine learning applications. As data scientists and engineers, we need to answer these questions for every project: Is our data secure? What is in our data? How do we monitor data properties over time? The DataProfiler, an open source project from Capital One, is a Python library designed to facilitate data analysis, monitoring and sensitive data detection. Description Descriptions DataProfiler was designed to accept a wide range of data formats including csv, avro, parquet, json, text, and pandas DataFrames. Whether the data is structured, semi-structured or unstructured, the library is able to identify the schema, statistics, entities from the data. In addition, the DataProfiler provides a cutting edge pre-trained deep learning model to efficiently identify sensitive information (or PII, such as customer name