빅데이터분석기사/코드
[빅데이터분석기사] 실기 4회 1유형 풀이(Python)
EveningPrimrose
2023. 6. 16. 18:49
반응형
문제 1
- age 컬럼의 3사분위수와 1사분위수의 차를 절대값으로 구하고, 소수점 버려서 정수로 출력
- data_path : ../input/bigdatacertificationkr/basic1.csv
import pandas as pd
df = pd.read_csv("../input/bigdatacertificationkr/basic1.csv")
# print("1사분위 : ", df['age'].quantile(0.25))
# print("3사분위 : ". df['age'].quantile(0.75))
result = abs(df['age'].quantile(0.25) - df['age'].quantile(0.75))
# print("절대값 차이 : ", result)
print(int(result))
# df['age_quantile'] = df['age'].quantile(.75) - df['age'].quantile(.25)
# ans = round(abs(df['age_quantile']), 0)
# print(ans)
문제 2
- (loves반응+wows반응)/(reactions반응) 비율이 0.4보다 크고 0.5보다 작으면서, type 컬럼이 'video'인 데이터의 갯수
- data_path : ../input/big-data-analytics-certification-kr-2022/fb.csv
import pandas as pd
df = pd.read_csv("../input/big-data-analytics-certification-kr-2022/fb.csv")
cond1 = (df['loves'] + df['wows']) / df['reactions'] > 0.4
cond2 = (df['loves'] + df['wows']) / df['reactions'] < 0.5
cond3 = df['type'] == 'video'
print(len(df[cond1 & cond2 & cond3]))
문제 3
- date_added가 2018년 1월이면서, county가 United Kingdom 단독 제작인 데이터의 개수
- data_path : ../input/big-data-analytics-certification-kr-2022/nf.csv
# 풀이 1
import pandas as pd
df = pd.read_csv("../input/big-data-analytics-certification-kr-2022/nf.csv")
cond1 = df['country'] == "United Kingdom"
df['data_added'] = pd.to_datetime(df['date_added'])
df['year'] = df['data_added'].dt.year
df['month'] = df['data_added'].dt.month
cond2 = df['year'] == 2018
cond3 = df['month'] == 1
print(len(df[cond1 & cond2 & cond3]))
# 풀이 2
import pandas as pd
df = pd.read_csv("../input/big-data-analystic-certificatino-kr-2022/nf.csv")
cond1 = df['country'] == "United Kingdom"
df['date_added'] = pd.to_datetime(df['date_added'])
cond2 = df['date_added'] >= '2018-1-1'
cond3 = df['date_added'] <= '2018-1-31'
print(len(df[cond1 & cond2 & cond3]))
# 풀이 3 datetime + between 활용
import pandas as pd
df = pd.read_csv("../input/big-data-analytics-certification-kr-2022/nf.csv")
cond1 = df['country'] == "United Kingdom"
df['date_added'] = pd.to_datetime(df['date_added'])
cond2 = df['date_added'].between('2018-1-1', '2018-1-31')
print(len(df[conf1 & cond2]))
# 풀이4
import pandas as pd
df = pd.read_csv("../input/big-data-analytics-certification-kr-2022/nf.csv")
cond1 = df['country'] == "United Kingdom"
df['date_added'] = df['date_added'].fillna("")
str1 = "2018"
str2 = "January"
cond2 = df['date_added'].str.contains(srt1)
cond3 = df['date_added'].str.contains(srt2)
print(len(df[cond1 & cond2 & cond3]))
반응형