pd.read_csv 구분자 콤마가 데이터안에 있을때 처리

Data Engineering/data preprocessing

pd.read_csv 구분자 콤마가 데이터안에 있을때 처리

quantapia 2021. 3. 12. 15:55

stackoverflow.com/questions/44786415/read-csv-with-extra-commas-and-no-quotechar-with-pandas

Read CSV with extra commas and no quotechar with Pandas?

Data: from io import StringIO import pandas as pd s = '''ID,Level,QID,Text,ResponseID,responseText,date_key 375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00 375280046,S,D...

stackoverflow.com

from io import StringIO

import pandas as pd

s = '''ID,Level,QID,Text,ResponseID,responseText,date_key

375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00

375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00

375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00'''

df = pd.read_csv(StringIO(s), sep=r',(?!\s)')

(?!\s) 부분은 뒤에 다음 공백이 없는 쉼표만 일치시키기 위해 전방탐색을 수행한다(negative lookahead)

regex1(?=(regex2)) : Positive Lookahead : regex1 다음 regex2의 정규표현식이 일치할 경우 반환
regex1(?!(regex2)) : Negative Lookahead : regex1 다음 regex2의 정규표현식이 일치하지 않을 경우 반환
(?<=(regex2))regex1 : Positive Lookbehind : regex2의 정규표현식이 일치하고 regex1가 나올 경우 반환
(?<!(regex2))regex1 : Negative Lookbehind : regex2의 정규표현식이 일치하지 않고 regex1가 나올 경우 반환

https://unlimitedpower.tistory.com/entry/정규표현식-이것이-고급이다-Positive-Negative-Lookahead-Lookbehind

현재글pd.read_csv 구분자 콤마가 데이터안에 있을때 처리

개발자를 넘어 과학자로!!

Data Engineering ML Engineering 다음은...Deep Learning ? 도전은 계속된다.

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

개발자를 넘어 과학자로!!

pd.read_csv 구분자 콤마가 데이터안에 있을때 처리

'Data Engineering/data preprocessing'의 다른글

티스토리툴바

pd.read_csv 구분자 콤마가 데이터안에 있을때 처리

'Data Engineering/data preprocessing'의 다른글

관련글

티스토리툴바