Cancel

graphcodebert+pairwise ensemble 0.8462

Posted Sep 10, 2022

By mng

1 min read

codebert에 model.from_pretrained에서, 단순히 codebert를 graphcodebert로 바꾸고 pairwise의 결과와의 단순 가중치 앙상블

Rank Ensemble

캐글 리더보드에 저 비율을 쓰더라

  
# Reading the submissions
df_1 = pd.read_csv('submission1.csv')
df_2 = pd.read_csv('submission2.csv')

# Averaging the indices and sorting the resulting submission by the aggregated ensembled indices
new_samples = []
for sample_idx in range(len(df_1)):
    # {'0a226b6a': 0, ...}
    sample_1 = {k: v for v, k in enumerate(df_1.iloc[sample_idx]['cell_order'].split(' '))}
    sample_2 = {k: v for v, k in enumerate(df_2.iloc[sample_idx]['cell_order'].split(' '))}
    for key in sample_1: 
        sample_1[key] = ((sample_1[key] * 0.748) + (sample_2[key] * 0.252))
    new_samples.append(' '.join([i[0] for i in list(sorted(sample_1.items(), key = lambda x: x[1]))]))
df_1['cell_order'] = new_samples

Result

public: 0.8462

private: 0.8410 (holy shit…)

NLP, KAGGLE, AI4CODE

This post is licensed under CC BY 4.0 by the author.

graphcodebert+pairwise ensemble 0.8462

Rank Ensemble

Result

Further Reading

distillbert(small) - pairwise 0.8171

Google AI4Code

distillbert(small) - baseline 0.7499