[PostgreSQL] sqlalchemy 라이브러리를 활용한 데이터 추출 실습

카테고리 없음

[PostgreSQL] sqlalchemy 라이브러리를 활용한 데이터 추출 실습

ParkS2 2025. 3. 28. 13:45

이번 포스트에는 Render Cloud Platform 에서 PostgreSQL을 생성하고 데이터를 적재 후, python에서 sqlalchemy 라이브러리를 활용해서 데이터를 추출하는 실습 과정을 포스팅하려고 한다.

1. Render Cloud Platform 에서 PostgreSQL을 생성

2. Python에서 csv파일을 데이터베이스에 적재하기

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import text

# 1. 데이터베이스 연결
DATABASE_URL = "_____"
engine = create_engine(DATABASE_URL)

# 2. CSV 경로
csv_path = r"train.csv"
df = pd.read_csv(csv_path)

#PostgreSQL로 업로드
df.to_sql("people_data", engine, index=False, if_exists="replace")

3. 칼럼명 전처리

칼럼명 전처리를 하는이유

SQL에서는 공백이나 괄호가 있는 컬럼명을 쓸 때 따옴표가 필요함 / Python 코드로 쿼리할 때 헷갈리고 오류 잘 남
to_sql()로 테이블 만들면 컬럼명이 소문자로 자동 저장되기 때문에, 파이썬에서 쿼리를 작성할때는 소문자로 작성해야함.

df.columns = [col.strip().replace(" ", "_").replace("(", "").replace(")", "").lower() for col in df.columns]

df.columns = [col.strip().replace(" ", "_").replace("(", "").replace(")", "") for col in df.columns]

4. 데이터 추출

이렇게 하면 Render Cloud Platform 의 PostgreSQL 에 데이터가 적재되게 된다.

그 이후로는 다음과 같은 형식으로 """___""" 안에 쿼리를 넣어서 데이터를 추출해주면 된다.

with engine.connect() as conn:
    result = conn.execute(text("""
    
    """))
    for row in result:
        print(row)

1. 연소득 상위 10명 추출

with engine.connect() as conn:
    result = conn.execute(text("""
        SELECT age, gender, education_status
        FROM people_data
        ORDER BY income DESC
        LIMIT 10
    """))
    for row in result:
        print(row)

(46, 'M', 'Doctorate degree')
(40, 'F', 'Associates degree (Vocational)')
(25, 'F', 'High graduate')
(49, 'F', 'High Freshman')
(44, 'F', 'Middle (7-8)')
(60, 'F', 'High graduate')
(62, 'F', 'High graduate')
(27, 'F', 'High graduate')
(37, 'M', 'High graduate')
(34, 'F', 'High graduate')

2. 고용 상태별 평균 소득, 평균 근무 주(연간), 평균 손실 금액

with engine.connect() as conn:
    result = conn.execute(text("""
        SELECT
        employment_status,
        ROUND(AVG(income), 2) AS avg_income,
        ROUND(AVG(working_week_yearly), 2) AS avg_weeks,
        ROUND(AVG(losses), 2) AS avg_losses
        FROM people_data
        GROUP BY employment_status
        ORDER BY avg_income DESC;
    """))
    for row in result:
        print(row)

('Part-Time (Usually Part-Time)', Decimal('877.91'), Decimal('45.39'), Decimal('91.82'))
('Choice Part-Time', Decimal('792.87'), Decimal('44.57'), Decimal('34.91'))
('Part-Time (Usually Full-Time)', Decimal('757.97'), Decimal('42.19'), Decimal('55.90'))
('Full-Time', Decimal('746.93'), Decimal('45.72'), Decimal('48.26'))
('Children or Armed Forces', Decimal('488.23'), Decimal('31.34'), Decimal('36.71'))
('Seeking Part-Time', Decimal('367.58'), Decimal('30.68'), Decimal('0.00'))
('Seeking Full-Time', Decimal('146.60'), Decimal('30.29'), Decimal('17.26'))
('Not Working', Decimal('0.00'), Decimal('3.80'), Decimal('28.04'))

3. 연소득 상위 10%에 해당하는 사람들중 어떤 고용상태가 가장 빈도가 많은지 추출

with engine.connect() as conn:
    result = conn.execute(text("""
WITH top10_income AS (
    SELECT *
    FROM people_data
    WHERE income >= (
        SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY income)
        FROM people_data
    )
)
SELECT employment_status, COUNT(*) AS count
FROM top10_income
GROUP BY employment_status
ORDER BY count DESC
LIMIT 1;
    """))
    for row in result:
        print(row)

('Children or Armed Forces', 965)

4. 대졸 이상 & 주 40주 이상 근무하면서 소득이 평균 이하인 사람 중 남녀 비율

with engine.connect() as conn:
    result = conn.execute(text("""
WITH filtered AS (
    SELECT *
    FROM people_data
    WHERE education_status ILIKE '%college%'
      AND working_week_yearly >= 40
      AND income <= (SELECT AVG(income) FROM people_data)
)
SELECT gender, COUNT(*) AS count
FROM filtered
GROUP BY gender;
    """))
    for row in result:
        print(row)

('M', 496)
('F', 609)

5. 각 출생 국가별 평균 소득 상위 5개 국가

with engine.connect() as conn:
    result = conn.execute(text("""
SELECT birth_country, ROUND(AVG(income), 2) AS avg_income
FROM people_data
GROUP BY birth_country
ORDER BY avg_income DESC
LIMIT 5;
    """))
    for row in result:
        print(row)

('Trinadad&Tobago', Decimal('1115.00'))
('Philippines', Decimal('836.47'))
('England', Decimal('835.87'))
('Peru', Decimal('793.41'))
('Ireland', Decimal('755.19'))