데이터 분석을 위한 panda-profiling

데이터 분석을 위해서는 통계적 지식과 프로그래밍 기술이 필요하다.
하지만 통계적 지식이 적고 프로그래밍 스킬이 낮다면 다른 사람이 만들어놓은 코드를 이용할 수 밖에 없다.
그동안 많은 library가 나왔지만 기능에 제한적이였다면 panda-profiling을 이런것들을 다 통합한 library라 하겠다.

대표적인 기능(원문 그대로 옮긴다)

Type inference: detect thetypesof columns in a dataframe.
Essentials: type, unique values, missing values
Quantile statisticslike minimum value, Q1, median, Q3, maximum, range, interquartile range
Descriptive statisticslike mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
Most frequent values
Histogram
Correlationshighlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
Missing valuesmatrix, count, heatmap and dendrogram of missing values

설치

pip 이용시

pip install pandas-profileing[notebook, html]
or
pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

conda 이용시

conda install -c conda-forge pandas-profiling

사용법

import numpy as np  
import pandas as pd  
from pandas\_profiling import ProfileReport  

#Create DataFrame  
df = pd.DataFrame( np.random.rand(100, 5), columns=\['a', 'b', 'c', 'd', 'e'\] )  
#Profiling Report  
profile = ProfileReport(df, title='Pandas Profiling Report', style={'full\_width':True})

나머지 내용은 원문 참조.

원문 : https://github.com/pandas-profiling/pandas-profiling?fbclid=IwAR2zPP5VvdYNkPLqpfCBuUMxstkVO6rGkcjmkUP9WcXRcWu21sPAZiAzfJo

pandas-profiling/pandas-profiling

Create HTML profiling reports from pandas DataFrame objects - pandas-profiling/pandas-profiling

github.com

저작자표시 비영리 동일조건 (새창열림)

'다시시작하는 > PYTHON' 카테고리의 다른 글

Python 코딩 가이드라인 (0)	2020.02.06
pandas 설치 (0)	2020.01.30
파이썬 팁 30 (0)	2020.01.06

지식의공유

데이터 분석을 위한 panda-profiling

설치

'다시시작하는 > PYTHON' 카테고리의 다른 글

티스토리툴바

데이터 분석을 위한 panda-profiling

설치

'다시시작하는 > PYTHON' 카테고리의 다른 글

관련글

티스토리툴바