You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to use Field Descriptions feature to improve the understanding of my dataset to LLMs. The way I am doing is write a data description function to create a dictionary info about dataset then pass then to pandasai through Field Descriptions like this:
data = preview_data(df)
# define a connector
connector = PandasConnector({"original_df": df}, name='My Connector', field_descriptions=data)
My part of data looks like this:
{'鏃堕棿': 'The 鏃堕棿 column contains string values. The unique values are: 2023-6-14, 2022-4-22, 2022-11-5.'}
As you can see there is some Chinese characters, but in the prompt_generation stage, the Chinese characters was not decoded thus it looks like this:
To address the issue with Chinese characters not being properly decoded during the prompt generation stage in pandasai, it's crucial to understand the encoding of your original data source and ensure it's UTF-8 or compatible. Additionally, knowing the Python version you're using and whether this issue occurs with other non-ASCII characters can help pinpoint the problem. A minimal, reproducible example of the code causing this issue would also be invaluable for finding a solution. This information will enable a more targeted approach to solving the decoding issue with Chinese characters.
Details
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 馃憤 if this is helpful and 馃憥 if it is irrelevant.
System Info
pandasai == 2.0.43
python == 3.11
馃悰 Describe the bug
I was trying to use Field Descriptions feature to improve the understanding of my dataset to LLMs. The way I am doing is write a data description function to create a dictionary info about dataset then pass then to pandasai through Field Descriptions like this:
My part of
data
looks like this:As you can see there is some Chinese characters, but in the prompt_generation stage, the Chinese characters was not decoded thus it looks like this:
Which makes LLM much more confused "\u65F6\u95F4".
Is any way we solve this problem? Any suggestion will be grateful!
The text was updated successfully, but these errors were encountered: